Roberto Rodriguez

Followers

Following

Mentions

Public Views

Uploads

Papers by Roberto Rodriguez

A Combined Memory Compression And Hierarchical Motion Estimation Architecture For Video Encoding In Embedded Systems

9th EUROMICRO Conference on Digital System Design (DSD'06), 2006

Download

A Unified Architecture for H.264 Multiple Block-Size DCT with Fast and Low Cost Quantization

9th EUROMICRO Conference on Digital System Design (DSD'06), 2006

AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and IT... more AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and ITU-T, which offers a substantial compression gain when compared with H.263 and MPEG-4 simple profile. One of the main characteristics of H.264 is the introduction of a integer version of the discrete cosine transform initially applied to 4times4 pixels blocks, and later extended to 8times8 pixels for high quality video encoding. In this work, a unified architecture is proposed for parallel 8times8 integer DCT and iDCT, also able to process 4times4 DCT, iDCT and Hadamard transform. A very fast quantization/de-quantization scheme is presented based on prediction that allows parallel quantization with a single multiplier. This architecture also implements all-zero detection, eliminating coefficients with high cost as specified in the standard and anticipates entropy encoding. The proposed design has been synthesized in AMS 0.35mu technology and achieves a maximum speed of 67 MHz

New model for arithmetic coding/decoding of multilevel images based on a cache memory

ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357)

In this work we present new methodologies for arithmetic encoding and decoding of multilevel imag... more In this work we present new methodologies for arithmetic encoding and decoding of multilevel images, achieving important improvements in cycle length and reducing complexity. Entropy coding methods should carry out operations of maintenance and search in tables, the size of which depends on the number of symbols of the alphabet. In this work we reduce the size of the table by introducing a new memory level, a cache. We obtain favourable speed-up and hardware savings, especially in the decoder. In some implementations the memory can be reduced to the cache, eliminating the RAM. Furthermore, the new scheme enables us to obtain excellent compression ratios.

Architectures for arithmetic coding in image compression

2000 10th European Signal Processing Conference, 2000

In this work we present and evaluate new architectures for the arithmetic encoding and decoding o... more In this work we present and evaluate new architectures for the arithmetic encoding and decoding of multilevel images. Arithmetic coding is of great interest due to the excellent results that it gives. On the other hand, the complexity of its implementation has always gone against it and its different applications usually suffer from a high computational cost, slowness or both. By introducing a new memory scheme, based on a cache memory, we solve the classic inconveniences of multilevel arithmetic codification hardware, obtaining architectures that are simpler and faster than the previous ones.

Download

An efficient ant colony optimization framework for HPC environments

Applied Soft Computing, 2021

Comparison of Hardwired and Microprogrammed Statechart Implementations

Electronics, 2020

In scientific facilities such as particle accelerators, fast and jitter-free synchronization is r... more In scientific facilities such as particle accelerators, fast and jitter-free synchronization is required in order to trigger a large number of actuators at the right time in a variety of situations. The behaviour of the control systems and subsystems may be specified by using statechart diagrams, which expand the capabilities of finite state machines allowing concurrency, a hierarchy of states, and history. Hence, there is a need of tools for synthesizing those diagrams so that a new control configuration may be deployed in a short time and an error-free manner in the required environments. In this work, we present a tool that analyses the specification of a variant of the State Chart XML (SCXML) standard tailored to hardware systems and produces a hardware description language (HDL) code suited to implement the required control systems using FPGAs. A number of solutions are provided to deal with the specific features of statecharts, such as multiple triggering events and concurrent...

Download

Truncated SIMD Multiplier Architecture for Approximate Computing in Low-Power Programmable Processors

IEEE Access, 2019

Download

A VLSI implementation of an arithmetic coder for image compression

EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167)

Arithmetic coding is an efficient data compressiontechnique. This paper describes the VLSI implem... more Arithmetic coding is an efficient data compressiontechnique. This paper describes the VLSI implementationof an arithmetic coder for a multilevel alphabet(256 symbols). The design we propose is based onthe use of redundant arithmetic and the development ofnew schemes for storing and updating the cumulativeprobabilities and updating the range and left point ofthe interval. The use of redundant arithmetic reducesthe delays of the modules, so the speed of the design itis improved. The...

Transaction level and RTL modeling of an architecture for network data compression within ethernet switches in large file transfer scenarios

2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016

Storage networks have become major components of modern data centers. In some applications, movin... more Storage networks have become major components of modern data centers. In some applications, moving huge amounts of data between servers and storage devices really challenges the architecture of the data center. Therefore, there is a growing interest in data compression applied to reduce the volume of data transfers in storage networks. Because of the latency, hardware is often preferred over software based compression. However, the administration overhead, and material cost required to furnish every server and storage device with a compression card is prohibitive. In this work, the architecture and implementation of a compressor-decompressor is presented. Then, the data flow is analyzed using Transfer Level Modeling in SystemC. The conclusions of that analysis are used to design an Ethernet switch in which data is compressed and decompressed as it flows between servers and storage devices in the network. The proposed system implements resource sharing, transparent use, and minimal latency on top of the benefits of data compression. This work is meant to be extended to other application beyond data compression, opening a new field for hardware-based accelerators, that will be located in the network rather that into individual nodes.

Modular architecture for multiple transforms in modern video standards

2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016

Modern video standards such as H.264 and HEVC introduce new simplified transform functions that a... more Modern video standards such as H.264 and HEVC introduce new simplified transform functions that allow for simple hardware implementation, different block sizes and enhanced coding efficiency. However, the number of different transforms to implement has increased, leading to the need of shared architectures able to process several transforms with minimum hardware overhead. This trend started with H.264, and continued with the new transforms in HEVC. Additionally, other video codecs such as VC1 and AVS should also be supported, together with the new ones still to appear. Therefore, it seems that new architectures will be necessary for each new generation of codecs, and that hardware sharing will continue to be a must. In this work, we propose a modular architecture that implements great flexibility and that permits extending to larger transform sides and scaling to higher levels of performance by just enabling or implementing more instances. The basic programmable module is introduced, together with techniques to support different transform sizes. Then, an evaluation of the performance for different transform functions is presented.

Pipelined FPGA implementation of numerical integration of the Hodgkin-Huxley model

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016

The Hodgkin-Huxley model describes the initiation and propagation of action potential in neurons&... more The Hodgkin-Huxley model describes the initiation and propagation of action potential in neurons' axons. The model consists of a set of nonlinear differential equations that can be solved using numerical methods for a given choice of parameters. As the equations reflect physiological processes, the value of those parameters are subject to great variability. Therefore, numerical integration is often combined with differential evolution methods in order to find which set of parameters minimizes some fitness function. As modern FPGAs are large enough to implement complex functions using double-precision floating-point arithmetic, intensive scientific computations may be carried out showing competitive performance and cost. In this work, we present a pipelined architecture for performing the 4th order Runge-Kutta integration of the equations of the Hodgkin-Huxley model, introducing convenient implementations of complex mathematical functions.

A fast algorithm for constructing nearly optimal prefix codes

Software: Practice and Experience, 2015

Huffman algorithm allows for constructing optimal prefix‐codes with O(n·logn) complexity. As the ... more Huffman algorithm allows for constructing optimal prefix‐codes with O(n·logn) complexity. As the number of symbols ngrows, so does the complexity of building the code‐words. In this paper, a new algorithm and implementation are proposed that achieve nearly optimal coding without sorting the probabilities or building a tree of codes. The complexity is proportional to the maximum code length, making the algorithm especially attractive for large alphabets. The focus is put on achieving almost optimal coding with a fast implementation, suitable for real‐time compression of large volumes of data. A practical case example about checkpoint files compression is presented, providing encouraging results. Copyright © 2015 John Wiley & Sons, Ltd.

Bitstream Syntax Description Language for 3D MPEG-4 view-dependent texture streaming

Proceedings. International Conference on Image Processing

Download

Arithmetic image coding/decoding architecture based on a cache memory

Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204)

We present a new arithmetic coding algorithm based on a small cache memory. The complexity of mul... more We present a new arithmetic coding algorithm based on a small cache memory. The complexity of multi level arithmetic coding has been reduced by restricting the operations to those symbols stored in the cache. We analyze the best organisation of the cache, trying out different configurations, associativity and replacement algorithms. Finally, a new architecture for encoding and decoding has been

High Speed 4-Symbol Arithmetic Encoder Architecture for Embedded Zero Tree-Based Compression

... ROBERTO R. OSORIO AND BART VANHOOF IMEC, DESICS, Kapeldreef, 75, B-3001 Leuven, Belgium ... R... more

Architecture and Implementation of a Data Compression System at Switch-Level in ATA-over-Ethernet Storage Networks

2013 Euromicro Conference on Digital System Design, 2013

In this work, a new architecture for loss less data compression and decompression is integrated w... more In this work, a new architecture for loss less data compression and decompression is integrated within an Ethernet switch using the NetFPGA open platform. The aim is compressing data packets in a block-based storage network. Data packets are compressed when written to the target disk and decompressed when read by the initiator. ATA-over-Ethernet (AoE) has been chosen as it is an efficient and relatively simple technology that does not rely on IP. The ultimate goal is achieving a better use of the available network bandwidth with the target and a possible reduction in power consumption. The use case of application-level check pointing in supercomputing is presented, for which compression ratios are given, and the efficiency of the proposed scheme is then discussed.

Method and system for transmitting digital signals between nodes of a network for optimized processing at the receiving node

Entropy Coding on a Programmable Processor Array for Multimedia SoC

2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP), 2007

Download

High Performance Image Processing on a Massively Parallel Processor Array

2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009

Multicore and manycore processors are the new wave of computing, offering high performance by usi... more Multicore and manycore processors are the new wave of computing, offering high performance by using large numbers of simple processors. In this paper, we describe the implementation of 2 applications into an Ambric massively parallel processor array from a hardware design point of view. An evaluation of performance and design effort is provided, showing that massive parallel processor arrays may challenges FPGAs in some applications.

A digital cellular-based system for retinal vessel-tree extraction

2009 European Conference on Circuit Theory and Design, 2009

Page 1. A Digital Cellular-Based System for Retinal Vessel-Tree Extraction César Dıaz Resco, Alej... more

A Combined Memory Compression And Hierarchical Motion Estimation Architecture For Video Encoding In Embedded Systems

9th EUROMICRO Conference on Digital System Design (DSD'06), 2006

Download

A Unified Architecture for H.264 Multiple Block-Size DCT with Fast and Low Cost Quantization

9th EUROMICRO Conference on Digital System Design (DSD'06), 2006

New model for arithmetic coding/decoding of multilevel images based on a cache memory

ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357)

Architectures for arithmetic coding in image compression

2000 10th European Signal Processing Conference, 2000

Download

An efficient ant colony optimization framework for HPC environments

Applied Soft Computing, 2021

Comparison of Hardwired and Microprogrammed Statechart Implementations

Electronics, 2020

Download

Truncated SIMD Multiplier Architecture for Approximate Computing in Low-Power Programmable Processors

IEEE Access, 2019

Download

A VLSI implementation of an arithmetic coder for image compression

EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167)

Transaction level and RTL modeling of an architecture for network data compression within ethernet switches in large file transfer scenarios

2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016

Modular architecture for multiple transforms in modern video standards

2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016

Pipelined FPGA implementation of numerical integration of the Hodgkin-Huxley model

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016

A fast algorithm for constructing nearly optimal prefix codes

Software: Practice and Experience, 2015

Bitstream Syntax Description Language for 3D MPEG-4 view-dependent texture streaming

Proceedings. International Conference on Image Processing

Download

Arithmetic image coding/decoding architecture based on a cache memory

Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204)

High Speed 4-Symbol Arithmetic Encoder Architecture for Embedded Zero Tree-Based Compression

... ROBERTO R. OSORIO AND BART VANHOOF IMEC, DESICS, Kapeldreef, 75, B-3001 Leuven, Belgium ... R... more

Architecture and Implementation of a Data Compression System at Switch-Level in ATA-over-Ethernet Storage Networks

2013 Euromicro Conference on Digital System Design, 2013

Method and system for transmitting digital signals between nodes of a network for optimized processing at the receiving node

Entropy Coding on a Programmable Processor Array for Multimedia SoC

2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP), 2007

Download

High Performance Image Processing on a Massively Parallel Processor Array

2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009

A digital cellular-based system for retinal vessel-tree extraction

2009 European Conference on Circuit Theory and Design, 2009

Page 1. A Digital Cellular-Based System for Retinal Vessel-Tree Extraction César Dıaz Resco, Alej... more

Roberto Rodriguez

Uploads

Papers by Roberto Rodriguez

Log In