9th EUROMICRO Conference on Digital System Design (DSD'06), 2006
AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and IT... more AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and ITU-T, which offers a substantial compression gain when compared with H.263 and MPEG-4 simple profile. One of the main characteristics of H.264 is the introduction of a integer version of the discrete cosine transform initially applied to 4times4 pixels blocks, and later extended to 8times8 pixels for high quality video encoding. In this work, a unified architecture is proposed for parallel 8times8 integer DCT and iDCT, also able to process 4times4 DCT, iDCT and Hadamard transform. A very fast quantization/de-quantization scheme is presented based on prediction that allows parallel quantization with a single multiplier. This architecture also implements all-zero detection, eliminating coefficients with high cost as specified in the standard and anticipates entropy encoding. The proposed design has been synthesized in AMS 0.35mu technology and achieves a maximum speed of 67 MHz
ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357)
In this work we present new methodologies for arithmetic encoding and decoding of multilevel imag... more In this work we present new methodologies for arithmetic encoding and decoding of multilevel images, achieving important improvements in cycle length and reducing complexity. Entropy coding methods should carry out operations of maintenance and search in tables, the size of which depends on the number of symbols of the alphabet. In this work we reduce the size of the table by introducing a new memory level, a cache. We obtain favourable speed-up and hardware savings, especially in the decoder. In some implementations the memory can be reduced to the cache, eliminating the RAM. Furthermore, the new scheme enables us to obtain excellent compression ratios.
2000 10th European Signal Processing Conference, 2000
In this work we present and evaluate new architectures for the arithmetic encoding and decoding o... more In this work we present and evaluate new architectures for the arithmetic encoding and decoding of multilevel images. Arithmetic coding is of great interest due to the excellent results that it gives. On the other hand, the complexity of its implementation has always gone against it and its different applications usually suffer from a high computational cost, slowness or both. By introducing a new memory scheme, based on a cache memory, we solve the classic inconveniences of multilevel arithmetic codification hardware, obtaining architectures that are simpler and faster than the previous ones.
In scientific facilities such as particle accelerators, fast and jitter-free synchronization is r... more In scientific facilities such as particle accelerators, fast and jitter-free synchronization is required in order to trigger a large number of actuators at the right time in a variety of situations. The behaviour of the control systems and subsystems may be specified by using statechart diagrams, which expand the capabilities of finite state machines allowing concurrency, a hierarchy of states, and history. Hence, there is a need of tools for synthesizing those diagrams so that a new control configuration may be deployed in a short time and an error-free manner in the required environments. In this work, we present a tool that analyses the specification of a variant of the State Chart XML (SCXML) standard tailored to hardware systems and produces a hardware description language (HDL) code suited to implement the required control systems using FPGAs. A number of solutions are provided to deal with the specific features of statecharts, such as multiple triggering events and concurrent...
EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167)
Arithmetic coding is an efficient data compressiontechnique. This paper describes the VLSI implem... more Arithmetic coding is an efficient data compressiontechnique. This paper describes the VLSI implementationof an arithmetic coder for a multilevel alphabet(256 symbols). The design we propose is based onthe use of redundant arithmetic and the development ofnew schemes for storing and updating the cumulativeprobabilities and updating the range and left point ofthe interval. The use of redundant arithmetic reducesthe delays of the modules, so the speed of the design itis improved. The...
2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016
Storage networks have become major components of modern data centers. In some applications, movin... more Storage networks have become major components of modern data centers. In some applications, moving huge amounts of data between servers and storage devices really challenges the architecture of the data center. Therefore, there is a growing interest in data compression applied to reduce the volume of data transfers in storage networks. Because of the latency, hardware is often preferred over software based compression. However, the administration overhead, and material cost required to furnish every server and storage device with a compression card is prohibitive. In this work, the architecture and implementation of a compressor-decompressor is presented. Then, the data flow is analyzed using Transfer Level Modeling in SystemC. The conclusions of that analysis are used to design an Ethernet switch in which data is compressed and decompressed as it flows between servers and storage devices in the network. The proposed system implements resource sharing, transparent use, and minimal latency on top of the benefits of data compression. This work is meant to be extended to other application beyond data compression, opening a new field for hardware-based accelerators, that will be located in the network rather that into individual nodes.
2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016
Modern video standards such as H.264 and HEVC introduce new simplified transform functions that a... more Modern video standards such as H.264 and HEVC introduce new simplified transform functions that allow for simple hardware implementation, different block sizes and enhanced coding efficiency. However, the number of different transforms to implement has increased, leading to the need of shared architectures able to process several transforms with minimum hardware overhead. This trend started with H.264, and continued with the new transforms in HEVC. Additionally, other video codecs such as VC1 and AVS should also be supported, together with the new ones still to appear. Therefore, it seems that new architectures will be necessary for each new generation of codecs, and that hardware sharing will continue to be a must. In this work, we propose a modular architecture that implements great flexibility and that permits extending to larger transform sides and scaling to higher levels of performance by just enabling or implementing more instances. The basic programmable module is introduced, together with techniques to support different transform sizes. Then, an evaluation of the performance for different transform functions is presented.
2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016
The Hodgkin-Huxley model describes the initiation and propagation of action potential in neurons&... more The Hodgkin-Huxley model describes the initiation and propagation of action potential in neurons' axons. The model consists of a set of nonlinear differential equations that can be solved using numerical methods for a given choice of parameters. As the equations reflect physiological processes, the value of those parameters are subject to great variability. Therefore, numerical integration is often combined with differential evolution methods in order to find which set of parameters minimizes some fitness function. As modern FPGAs are large enough to implement complex functions using double-precision floating-point arithmetic, intensive scientific computations may be carried out showing competitive performance and cost. In this work, we present a pipelined architecture for performing the 4th order Runge-Kutta integration of the equations of the Hodgkin-Huxley model, introducing convenient implementations of complex mathematical functions.
We present a new arithmetic coding algorithm based on a small cache memory. The complexity of mul... more We present a new arithmetic coding algorithm based on a small cache memory. The complexity of multi level arithmetic coding has been reduced by restricting the operations to those symbols stored in the cache. We analyze the best organisation of the cache, trying out different configurations, associativity and replacement algorithms. Finally, a new architecture for encoding and decoding has been
... ROBERTO R. OSORIO AND BART VANHOOF IMEC, DESICS, Kapeldreef, 75, B-3001 Leuven, Belgium ... R... more ... ROBERTO R. OSORIO AND BART VANHOOF IMEC, DESICS, Kapeldreef, 75, B-3001 Leuven, Belgium ... Roberto.Osorio@imec.be Bart Vanhoof received the electrical engineering degree from the Katholieke Universiteit Leuven, Belgium in '89. ...
2013 Euromicro Conference on Digital System Design, 2013
In this work, a new architecture for loss less data compression and decompression is integrated w... more In this work, a new architecture for loss less data compression and decompression is integrated within an Ethernet switch using the NetFPGA open platform. The aim is compressing data packets in a block-based storage network. Data packets are compressed when written to the target disk and decompressed when read by the initiator. ATA-over-Ethernet (AoE) has been chosen as it is an efficient and relatively simple technology that does not rely on IP. The ultimate goal is achieving a better use of the available network bandwidth with the target and a possible reduction in power consumption. The use case of application-level check pointing in supercomputing is presented, for which compression ratios are given, and the efficiency of the proposed scheme is then discussed.
2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009
Multicore and manycore processors are the new wave of computing, offering high performance by usi... more Multicore and manycore processors are the new wave of computing, offering high performance by using large numbers of simple processors. In this paper, we describe the implementation of 2 applications into an Ambric massively parallel processor array from a hardware design point of view. An evaluation of performance and design effort is provided, showing that massive parallel processor arrays may challenges FPGAs in some applications.
2009 European Conference on Circuit Theory and Design, 2009
Page 1. A Digital Cellular-Based System for Retinal Vessel-Tree Extraction César Dıaz Resco, Alej... more Page 1. A Digital Cellular-Based System for Retinal Vessel-Tree Extraction César Dıaz Resco, Alejandro Nieto, Roberto R. Osorio, Victor M. Brea, David L. Vilarino Departament of Electronics and Computer Science University ...
9th EUROMICRO Conference on Digital System Design (DSD'06), 2006
AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and IT... more AVC/H.264 is the new international standard for video coding jointly developed by ISO-MPEG and ITU-T, which offers a substantial compression gain when compared with H.263 and MPEG-4 simple profile. One of the main characteristics of H.264 is the introduction of a integer version of the discrete cosine transform initially applied to 4times4 pixels blocks, and later extended to 8times8 pixels for high quality video encoding. In this work, a unified architecture is proposed for parallel 8times8 integer DCT and iDCT, also able to process 4times4 DCT, iDCT and Hadamard transform. A very fast quantization/de-quantization scheme is presented based on prediction that allows parallel quantization with a single multiplier. This architecture also implements all-zero detection, eliminating coefficients with high cost as specified in the standard and anticipates entropy encoding. The proposed design has been synthesized in AMS 0.35mu technology and achieves a maximum speed of 67 MHz
ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357)
In this work we present new methodologies for arithmetic encoding and decoding of multilevel imag... more In this work we present new methodologies for arithmetic encoding and decoding of multilevel images, achieving important improvements in cycle length and reducing complexity. Entropy coding methods should carry out operations of maintenance and search in tables, the size of which depends on the number of symbols of the alphabet. In this work we reduce the size of the table by introducing a new memory level, a cache. We obtain favourable speed-up and hardware savings, especially in the decoder. In some implementations the memory can be reduced to the cache, eliminating the RAM. Furthermore, the new scheme enables us to obtain excellent compression ratios.
2000 10th European Signal Processing Conference, 2000
In this work we present and evaluate new architectures for the arithmetic encoding and decoding o... more In this work we present and evaluate new architectures for the arithmetic encoding and decoding of multilevel images. Arithmetic coding is of great interest due to the excellent results that it gives. On the other hand, the complexity of its implementation has always gone against it and its different applications usually suffer from a high computational cost, slowness or both. By introducing a new memory scheme, based on a cache memory, we solve the classic inconveniences of multilevel arithmetic codification hardware, obtaining architectures that are simpler and faster than the previous ones.
In scientific facilities such as particle accelerators, fast and jitter-free synchronization is r... more In scientific facilities such as particle accelerators, fast and jitter-free synchronization is required in order to trigger a large number of actuators at the right time in a variety of situations. The behaviour of the control systems and subsystems may be specified by using statechart diagrams, which expand the capabilities of finite state machines allowing concurrency, a hierarchy of states, and history. Hence, there is a need of tools for synthesizing those diagrams so that a new control configuration may be deployed in a short time and an error-free manner in the required environments. In this work, we present a tool that analyses the specification of a variant of the State Chart XML (SCXML) standard tailored to hardware systems and produces a hardware description language (HDL) code suited to implement the required control systems using FPGAs. A number of solutions are provided to deal with the specific features of statecharts, such as multiple triggering events and concurrent...
EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167)
Arithmetic coding is an efficient data compressiontechnique. This paper describes the VLSI implem... more Arithmetic coding is an efficient data compressiontechnique. This paper describes the VLSI implementationof an arithmetic coder for a multilevel alphabet(256 symbols). The design we propose is based onthe use of redundant arithmetic and the development ofnew schemes for storing and updating the cumulativeprobabilities and updating the range and left point ofthe interval. The use of redundant arithmetic reducesthe delays of the modules, so the speed of the design itis improved. The...
2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016
Storage networks have become major components of modern data centers. In some applications, movin... more Storage networks have become major components of modern data centers. In some applications, moving huge amounts of data between servers and storage devices really challenges the architecture of the data center. Therefore, there is a growing interest in data compression applied to reduce the volume of data transfers in storage networks. Because of the latency, hardware is often preferred over software based compression. However, the administration overhead, and material cost required to furnish every server and storage device with a compression card is prohibitive. In this work, the architecture and implementation of a compressor-decompressor is presented. Then, the data flow is analyzed using Transfer Level Modeling in SystemC. The conclusions of that analysis are used to design an Ethernet switch in which data is compressed and decompressed as it flows between servers and storage devices in the network. The proposed system implements resource sharing, transparent use, and minimal latency on top of the benefits of data compression. This work is meant to be extended to other application beyond data compression, opening a new field for hardware-based accelerators, that will be located in the network rather that into individual nodes.
2016 Conference on Design of Circuits and Integrated Systems (DCIS), 2016
Modern video standards such as H.264 and HEVC introduce new simplified transform functions that a... more Modern video standards such as H.264 and HEVC introduce new simplified transform functions that allow for simple hardware implementation, different block sizes and enhanced coding efficiency. However, the number of different transforms to implement has increased, leading to the need of shared architectures able to process several transforms with minimum hardware overhead. This trend started with H.264, and continued with the new transforms in HEVC. Additionally, other video codecs such as VC1 and AVS should also be supported, together with the new ones still to appear. Therefore, it seems that new architectures will be necessary for each new generation of codecs, and that hardware sharing will continue to be a must. In this work, we propose a modular architecture that implements great flexibility and that permits extending to larger transform sides and scaling to higher levels of performance by just enabling or implementing more instances. The basic programmable module is introduced, together with techniques to support different transform sizes. Then, an evaluation of the performance for different transform functions is presented.
2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2016
The Hodgkin-Huxley model describes the initiation and propagation of action potential in neurons&... more The Hodgkin-Huxley model describes the initiation and propagation of action potential in neurons' axons. The model consists of a set of nonlinear differential equations that can be solved using numerical methods for a given choice of parameters. As the equations reflect physiological processes, the value of those parameters are subject to great variability. Therefore, numerical integration is often combined with differential evolution methods in order to find which set of parameters minimizes some fitness function. As modern FPGAs are large enough to implement complex functions using double-precision floating-point arithmetic, intensive scientific computations may be carried out showing competitive performance and cost. In this work, we present a pipelined architecture for performing the 4th order Runge-Kutta integration of the equations of the Hodgkin-Huxley model, introducing convenient implementations of complex mathematical functions.
We present a new arithmetic coding algorithm based on a small cache memory. The complexity of mul... more We present a new arithmetic coding algorithm based on a small cache memory. The complexity of multi level arithmetic coding has been reduced by restricting the operations to those symbols stored in the cache. We analyze the best organisation of the cache, trying out different configurations, associativity and replacement algorithms. Finally, a new architecture for encoding and decoding has been
... ROBERTO R. OSORIO AND BART VANHOOF IMEC, DESICS, Kapeldreef, 75, B-3001 Leuven, Belgium ... R... more ... ROBERTO R. OSORIO AND BART VANHOOF IMEC, DESICS, Kapeldreef, 75, B-3001 Leuven, Belgium ... Roberto.Osorio@imec.be Bart Vanhoof received the electrical engineering degree from the Katholieke Universiteit Leuven, Belgium in '89. ...
2013 Euromicro Conference on Digital System Design, 2013
In this work, a new architecture for loss less data compression and decompression is integrated w... more In this work, a new architecture for loss less data compression and decompression is integrated within an Ethernet switch using the NetFPGA open platform. The aim is compressing data packets in a block-based storage network. Data packets are compressed when written to the target disk and decompressed when read by the initiator. ATA-over-Ethernet (AoE) has been chosen as it is an efficient and relatively simple technology that does not rely on IP. The ultimate goal is achieving a better use of the available network bandwidth with the target and a possible reduction in power consumption. The use case of application-level check pointing in supercomputing is presented, for which compression ratios are given, and the efficiency of the proposed scheme is then discussed.
2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009
Multicore and manycore processors are the new wave of computing, offering high performance by usi... more Multicore and manycore processors are the new wave of computing, offering high performance by using large numbers of simple processors. In this paper, we describe the implementation of 2 applications into an Ambric massively parallel processor array from a hardware design point of view. An evaluation of performance and design effort is provided, showing that massive parallel processor arrays may challenges FPGAs in some applications.
2009 European Conference on Circuit Theory and Design, 2009
Page 1. A Digital Cellular-Based System for Retinal Vessel-Tree Extraction César Dıaz Resco, Alej... more Page 1. A Digital Cellular-Based System for Retinal Vessel-Tree Extraction César Dıaz Resco, Alejandro Nieto, Roberto R. Osorio, Victor M. Brea, David L. Vilarino Departament of Electronics and Computer Science University ...
Uploads
Papers by Roberto Rodriguez