The MultiNoC system implements a programmable on-chip multiprocessing platform built on top of an... more The MultiNoC system implements a programmable on-chip multiprocessing platform built on top of an efficient, low area overhead intra-chip interconnection scheme. The employed interconnection structure is a Network on Chip, or NoC. NoCs are emerging as a viable alternative to increasing demands on interconnection architectures, due to the following characteristics: (i) energy efficiency and reliability; (ii) scalability of bandwidth, when compared to traditional bus architectures; (iii) reusability; (iv) distributed routing decisions. An external host computer feeds MultiNoC with application instructions and data. After this initialization procedure, MultiNoC executes some algorithm. After finishing execution of the algorithm, output data can be read back by the host. Sequential or parallel algorithms conveniently adapted to the MultiNoC structure can be executed. The main motivation to propose this design is to enable the investigation of current trends to increase the number of embedded processors in SoCs, leading to the concept of "sea of processors" systems.
JICS. Journal of integrated circuits and systems, May 22, 2023
The design of digital circuits on recent technologies brings several challenges, among which robu... more The design of digital circuits on recent technologies brings several challenges, among which robustness to variations stands out. Variation sources are multiple, and the evolution of integrated circuit fabrication techniques increases the number and relevance of such sources, and the complexity of ensuring circuit robustness against them. Some design paradigms naturally counter variations of one or more types. Asynchronous self-timed design is one such paradigm that can provide robustness to process, voltage, temperature, ageing and IR drop variations, to cite some of the main types. This paper proposes an enhancement to the Pulsar environment, a recently proposed open source automated flow for the design of self-timed clockless circuits. The six components proposed here enable describing choices and decisions on the flow of data tokens inside asynchronous circuits. Design capture in Pulsar can then employ these. To implement the abstract (synthesis-enabled) components, the paper also brings the proposal of the handshaking mutex, a versatile complex gate that eases the design of probe and arbiter, the two most complex among the new components. Results demonstrate the new version of Pulsar is more powerful than the previous, baseline, version, enabling the design capture and the automated synthesis steps of more complex asynchronous self-timed circuits. They also indicate the handshaking mutex operates correctly, and with a good level of attested fairness.
Analog Integrated Circuits and Signal Processing, 2021
The current state of the telecommunications market exhibits a high potential to absorb efficient ... more The current state of the telecommunications market exhibits a high potential to absorb efficient innovations in wireless connectivity, especially those that can be applied to the Internet of Things and similar domains. Contributing in that direction, this paper describes the design and implementation of a fully differential impulse-radio ultra-wideband (IR-UWB) transmitter using pulse-amplitude modulation, with an adaptive power spectrum density (PSD). The architecture can produce up to eight differential monocycles per clock pulse at its output. The number of monocycles controls the bandwidth (thus the PSD) in the mask of IR-UWB technologies, allowing adaptation to multiple standards. The complete transmitter has four main blocks: (a) a pulse generator, comprising two pulse generating circuit groups, to modulate and create a rectangular waveform; (b) an active balun with two amplifiers, to generate differential signals; (c) a digital demultiplexer, to alternate data to the pulse generating circuit groups; (d) a binary-to-thermometer decoder, to control the amount of generated monocycles per pulse. Simulations demonstrate an output pulse amplitude of 120 mV for the high logic level and of 70 mV for the low logic level, both at a 100 MHz Pulse Repetition Frequency. This produces a mean pulse duration of 277 ps, a mean central frequency of 3.8 GHz, and a mean power consumption 6.7 mW. The transmitter takes the form of an intellectual property core in a 130 nm CMOS technology. The complete transmitter area is 0.067 mm 2 , without I/O pads. The outcomes suggest that the proposed circuit can narrow or widen the output signal bandwidth, providing adaptability to different emission requirements.
The periodic nature of the global clock in traditional synchronous designs forces circuits to be ... more The periodic nature of the global clock in traditional synchronous designs forces circuits to be margined for the worst possible case of process, voltage, temperature, and data conditions. This constrains the silicon to operate at worst-case frequencies and at conservative supply voltages. Resilient architectures promise to remove these margins, by detecting and correcting timing errors when they occur, thereby creating the potential to achieve real average-case operation. However, synchronous resilient schemes previously proposed can suffer from multiple issues, including being susceptible to metastability and requiring often complex changes to the architecture to support replay-based recovery from timing errors. These problems respectively lead to circuit failures and/or incur high timing penalties when errors occur. This paper reviews a recently proposed asynchronous bundled-data resilient template called Blade that is robust to metastability issues, requires no replay-based logic, and has low timing error penalties. It also describes some open issues and new research opportunities this template presents, including automation problems to target average-case operation, specific circuit optimizations to minimize resiliency overhead, and the need for new test procedures to tune delay lines and screen out bad chips.
This paper proposes an adaptive pulse generator using Pulse Amplitude Modulation (PAM). The circu... more This paper proposes an adaptive pulse generator using Pulse Amplitude Modulation (PAM). The circuit was implemented with eight Pulse Generator Units (PGUs) to produce up to eight monocycles per pulse. The number of monocycles per pulse is inversely proportional to the Power Spectrum Density (PSD) bandwidth in the Impulse Radio Ultra-Wide Band (IR-UWB). The complete circuit contains two pulse generator blocks, each one composed by eight PGUs to build a rectangular waveform at the output. The PGU has been implemented with Edge Combiners High (ECH) and Edge Combiners Low (ECL) to encode the information. Each Edge Combiner has a high impedance circuit that is selected by digital control signals. The circuit has been simulated, showing an output pulse amplitude of ≈70mV for the high logic level and an amplitude of ≈35mV for the low logic level, both at 100 MHz Pulse Repetition Frequency (PRF). This produces a mean pulse duration of ≈270ps, a mean central frequency of ≈3.7GHz and a power consumption less than 0,22µW. The pulse generator block occupies an area of 0.54mm 2 .
Symposium on Integrated Circuits and Systems Design, Aug 1, 2016
Static timing analysis (STA) is a widely used technique to perform timing verification of digital... more Static timing analysis (STA) is a widely used technique to perform timing verification of digital circuits analytically. These models are available in standard cell libraries, and are usually generated based on data acquired from timing characterization performed by foundries. Even though advanced node libraries are characterized for some corners, they currently do not cover voltage levels that will likely drive ultra-low power applications as those present in domains like IoT and wearable devices, a trend for upcoming years. This work contributes to solve this issue, by proposing a characterization flow that can be applied to any standard cell set. The flow is foundry-independent and relies solely on information that can be obtained from any cell library. It employs commercial tools incremented by a set of in-house scripts. As a case study, the article explores the characterization of a commercial standard cell library for a 28nm FDSOI technology. The analysis shows that this library can be characterized for voltages as low as 250mV and still guarantee that its cells, or a subset of these, work as intended by the technology rules. A set of experiments show that the flow obtains characterization results that match those of the associated commercial library within 5% of error, for those voltages where the latter provides information.
Journal of Communication and Information Systems, Aug 30, 2005
This paper describes the design and prototyping of EMS, a telecommunication intellectual property... more This paper describes the design and prototyping of EMS, a telecommunication intellectual property soft-core developed in the scope of industry-academia cooperation. EMS performs insertion (mapping) and extraction (demapping) of EI channels into/from Synchronous Digital Hierarchy (SDH) frames. The basic SDH frame is transmitted in 155.52 Mbps rate, allowing to pack up to sixty-three 2.048 Mbps El channels. El channels belong to the Plesiochronous Digital Hierarchy (PDH). The paper addresses the solution of several synchronization problems implied by the El channels mapping/demapping process. EMS was fully described in RTL VHDL. It was functionally validated by simulation and prototyped in FPGA platforms. Together with the exploration of the techniques involved in embedding PDH into SDH frames, another contribution of the work is the availability of a reusable and parameterizable telecom core with high performance, low latency, and small size. Keywords-SDH, EI, SDH-EI mapping/demapping, soft IP core. Resumo-Este trabalho descreve 0 projeto e prototipa~ao do nucleo de propriedade intelectual para telecomunica~6es chamado EMS, que foi desenvolvido no escopo de uma coopera~ao Universidade-Empresa. EMS realiza a inser~ao/remo~ao de canais El ern/de quadros SDH (Synchronous Digital Hierarchy). 0 quadro basico SDH e transmitido em uma taxa de 155.52 Mbps, perrnitindo empacotar ate 63 canais El (2.048 Mbps). Os canais El pertencem a hierarquia digital plesiocrona (PDH). 0 trabalho trata da solu~ao de diversos problemas gerados pelo processo de inser~ao/remo~ao de canais El. EMS foi completamente descrito em RTL VHDL, foi validado funcionalmente por simula~ao e prototipado em plataformas FPGAs. Junto com a explora~ao de tecnicas que envolvem o empacotamento de PDH em quadros SDH, outra contribui~ao deste trabalho esta na disponibilidade de reutilizar e parametrizar urn nucleo de telecomunica~ao com alto desempenho, baixa latencia e tamanho pequeno. This work was supported by PUCRS University and Parks SA in the scope of an industry-academia cooperation. Authors I and 2 are PhD. students of Federal University of Rio Grande do SuI (UFRGS)-Porto Alegre-Rio Grande do Sui, Brazil-emails:{marcon.jcspalma}@inf.ufrgs.br. Authors 3 and 4 are professors of Pontifical Catholic University of Rio Grande do Sui (PUCRS)-Porto Alegre-Rio Grande do Sui, Brazil-emails{calazans.moraes}@inf.pucrs.br. Editor, networking area for Revista SBrT: Eduardo W. Bergamini.
This work presents ARV, an asynchronous superscalar organisation for the RISC-V architecture. As ... more This work presents ARV, an asynchronous superscalar organisation for the RISC-V architecture. As far as the authors could verify, this is the first proposal of an asynchronous version for this recent open source processor architecture. The organisation is modelled using Google's Go as a high level hardware description language. Go has proved adequate to model the refined handshake structures present in the asynchronous design of complex super-scalar structures. Preliminary performance data obtained using the Go model enabled a detailed evaluation of the organisation, providing design exploration of several points to further improve the organisation before committing to its implementation at lower abstraction levels.
O trabalho objetiva desenvolver um compilador cruzado para a linguagem de processamento paralelo ... more O trabalho objetiva desenvolver um compilador cruzado para a linguagem de processamento paralelo occam2 para o processador TMS320C40 da Texas Instruments, com isso disponibilizar uma linguagem originalmente baseada em paradigmas de processamento paralelo para arquiteturas de processamento de sinal digital. Este trabalho motiva-se na cooperação da universidade de Kent-UK, onde foi desenvolvida uma ferramenta similar (compilador occam2 para estações Sparc). Entre os trabalhos já desenvolvidos está a obtenção do Código fonte do Kroc, estudo do código, instalação nas máquinas Sun e identificação dos módulos a serem modificados. Estando em andamento a tradução do compilador, estando essa dividida em três módulos : o tradutor de código fonte occam2 em linguagem assembly para o TMS320C40, o escalonador, que é responsável pelo concorrência entre processos, e o ligador, cujo faz a adição do resultado da tradução e com o escalonador. (CNPq)
Multi-processor Systems-on-Chip (MPSoCs) have been proposed to tackle embedded systems' requireme... more Multi-processor Systems-on-Chip (MPSoCs) have been proposed to tackle embedded systems' requirements due to their potential for low-power consumption and high scalability. These systems fit the needs of many application domains, including robotics and autonomous vehicles, in which reliability, performance, and timeliness are critical to operation. In this paper, we propose an integrated environment for the development of robotic applications targeting MPSoCs. The proposed environment eases the evaluation of non-functional requirements by combining cycle-accurate simulations from RTL models with behavioral simulations from robotics. We present a case study of the proposed environment in the context of a UAV (unmanned aerial vehicle) stabilization software, providing performance and energy estimations for different software implementations.
Planning and implementing a semiconductor integrated circuit is a highly complex process. Althoug... more Planning and implementing a semiconductor integrated circuit is a highly complex process. Although physical limits seem to be approaching, it currently follows a growing evolutionary path. As deep submicron technologies evolve towards perhaps even sub-nano geometries, the design process complicates accordingly. Once subtle in higher geometry nodes, some effects become relevant or even dominant. Examples are effects that tamper the reliability of wires, such as crosstalk, or the adequate behaviour of gates, such as the increasing sensitivity to single event effects. Design techniques must thus also evolve, to provide a wide range of tools to deal with new effects during the integrated circuit design and test processes. This tutorial covers one set of design techniques that is often overlooked, but which can reveal themselves instrumental in dealing with the mentioned technology evolution, the use of clockless or asynchronous circuits. The tutorial is divided into three parts: first i...
2019 IEEE 10th Latin American Symposium on Circuits & Systems (LASCAS)
Random number generators find application in many fields, including cryptography, digital signatu... more Random number generators find application in many fields, including cryptography, digital signatures and network equipment testers, to cite a few. Two main classes of such generators are usually proposed, pseudo-random number generators and truerandom number generators. The former are simple to build and use, but cannot be employed in every application, especially in those where randomness is meant to support security. The later can be complicated to build, since they often must rely on hard-to-predict events that are hard to produce in the deterministic world of digital circuits. This work proposes a quasi-random number generator hardware implementation, intended to provide most of the benefits of true-random number generators with costs closer to those of pseudorandom number generators. The quasi-random number generator described here relies on the use of asynchronous circuit design techniques allied to process, voltage and temperature variability to achieve relatively high degrees of randomness. An FPGA prototype demonstrates the feasibility of the approach.
2017 24th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2017
Network test needs grow as fast as network performance. These must be reliable, remain available ... more Network test needs grow as fast as network performance. These must be reliable, remain available without interruption, and secure user data. Thus, they must continually be tested for service quality, based on measurements like throughput, latency, and jitter. Network and network equipment tests can rely on test software, dedicated test hardware, or in FPGA platforms. The latter offer a trade-off between software solutions limited bandwidth and high performance dedicated test equipment. This work proposes a new, industrial grade, low-cost and accurate network test solution, XGT4. XGT4 employs the NetFPGA SUME board and provides full support to the RFC2544 test benchmarks for 1G and 10G Ethernet equipment, a feature not available in other open source testers. Besides, XGT4 can be remotely operated through a dedicated web interface, and deals with multiple simultaneous 10GbE streams. Results of using XGT4 to test a commercial DUT report realworld results for all RFC2544 tests: throughput, latency, back-to-back, frame loss rate, system recovery and reset. An additional feature is timed throughput runs, to conduct bit error rate tests. The XGT4 hardware has a small area footprint, taking less than 20% of the platform FPGA.
Side channel attacks (SCA) are known to be efficient techniques to retrieve secret data. In this ... more Side channel attacks (SCA) are known to be efficient techniques to retrieve secret data. In this context, this paper concerns the evaluation of the robustness of secure triple track logic (STTL) against power and electromagnetic analyses on FPGA devices. More precisely, it aims at demonstrating that the basic concepts behind STTL are valid in general and particularly for FPGAs. Also, the paper shows that this new logic may provide interesting design guidelines to get circuits that are resistant to differential power analysis (DPA) attacks which and also more robust against differential electromagnetic attacks (DEMA).
No presente trabalho é apresentada uma arquitetura para acelerar o ciclo do projeto de sistemas d... more No presente trabalho é apresentada uma arquitetura para acelerar o ciclo do projeto de sistemas digitais na etapa de prototipação. A ferramenta em questão consiste em uma placa para prototipação (comercializada pela empresa inglesa Sundance) composta por um dispositivo lógico programável (FPGA), um processador Transputer, memórias e compiladores (para geração dos arquivos de configuração do FPGA e código a ser executado pelo Transputer). Na implementação dois enfoques são abordados: sistemas descritos totalmente em hardware; e sistemas descritos parte em hardware e parte em software. Os sistemas são descritos na forma de processos concorrentes e comunicantes, sendo que os processos de hardware são implementados em Handel-C (FPGA) e os processos de software são implementados em occam (Transputer). Com a utilização da placa de prototipação é possível avaliar quais processos devem ser executados em hardware (tempo de execução crítico) e quais devem ser executados em software. Esse trabalho está sendo desenvolvido no âmbito do projeto PISH (PROTEM-CNPq), do qual participam a UFPE, UFRGS e PUCRS.
No âmbito do projeto PISH (Projeto Integrado de Software e Hardware) faz-se necessária a implemen... more No âmbito do projeto PISH (Projeto Integrado de Software e Hardware) faz-se necessária a implementação de uma ferramenta para tradução de VHDL (linguagem utilizada para descrição de hardware) em Handel-C. O PISH é um projeto PROTEM que envolve a UFPE, UFRGS e PUCRS. O objetivo principal do PISH é pesquisar e propor métodos de automatização do processo de projeto de sistemas de hardware e software (hardware/software codesign), cujo desenvolvimento será realizado a partir de uma descrição funcional com alto nível de abstração. A descrição em VHDL, cuja elaboração está sob responsabilidade da UFRGS, encontra-se em alto nível de abstração numa descrição de domínio estrutural, sendo utilizada na representação do bloco operacional de um sistema digital. A linguagem Handel-C, utilizada pelo grupo de prototipação da PUCRS na geração de configurações para FPGAs, é uma linguagem para descrição de hardware baseada nas linguagens C e occam (sendo occam uma linguagem concebida a partir do modelo CSP). As principais etapas do trabalho são: estudo do VHDL para definição de um subconjunto a ser utilizado; estudo do Handel-C; e implementação do tradutor. Esta ferramenta está sendo implementada a partir da definição de uma gramática, com auxílio da ferramenta Lex & Yacc. O tradutor está sendo desenvolvido em estações de trabalho SUN.
The MultiNoC system implements a programmable on-chip multiprocessing platform built on top of an... more The MultiNoC system implements a programmable on-chip multiprocessing platform built on top of an efficient, low area overhead intra-chip interconnection scheme. The employed interconnection structure is a Network on Chip, or NoC. NoCs are emerging as a viable alternative to increasing demands on interconnection architectures, due to the following characteristics: (i) energy efficiency and reliability; (ii) scalability of bandwidth, when compared to traditional bus architectures; (iii) reusability; (iv) distributed routing decisions. An external host computer feeds MultiNoC with application instructions and data. After this initialization procedure, MultiNoC executes some algorithm. After finishing execution of the algorithm, output data can be read back by the host. Sequential or parallel algorithms conveniently adapted to the MultiNoC structure can be executed. The main motivation to propose this design is to enable the investigation of current trends to increase the number of embedded processors in SoCs, leading to the concept of "sea of processors" systems.
JICS. Journal of integrated circuits and systems, May 22, 2023
The design of digital circuits on recent technologies brings several challenges, among which robu... more The design of digital circuits on recent technologies brings several challenges, among which robustness to variations stands out. Variation sources are multiple, and the evolution of integrated circuit fabrication techniques increases the number and relevance of such sources, and the complexity of ensuring circuit robustness against them. Some design paradigms naturally counter variations of one or more types. Asynchronous self-timed design is one such paradigm that can provide robustness to process, voltage, temperature, ageing and IR drop variations, to cite some of the main types. This paper proposes an enhancement to the Pulsar environment, a recently proposed open source automated flow for the design of self-timed clockless circuits. The six components proposed here enable describing choices and decisions on the flow of data tokens inside asynchronous circuits. Design capture in Pulsar can then employ these. To implement the abstract (synthesis-enabled) components, the paper also brings the proposal of the handshaking mutex, a versatile complex gate that eases the design of probe and arbiter, the two most complex among the new components. Results demonstrate the new version of Pulsar is more powerful than the previous, baseline, version, enabling the design capture and the automated synthesis steps of more complex asynchronous self-timed circuits. They also indicate the handshaking mutex operates correctly, and with a good level of attested fairness.
Analog Integrated Circuits and Signal Processing, 2021
The current state of the telecommunications market exhibits a high potential to absorb efficient ... more The current state of the telecommunications market exhibits a high potential to absorb efficient innovations in wireless connectivity, especially those that can be applied to the Internet of Things and similar domains. Contributing in that direction, this paper describes the design and implementation of a fully differential impulse-radio ultra-wideband (IR-UWB) transmitter using pulse-amplitude modulation, with an adaptive power spectrum density (PSD). The architecture can produce up to eight differential monocycles per clock pulse at its output. The number of monocycles controls the bandwidth (thus the PSD) in the mask of IR-UWB technologies, allowing adaptation to multiple standards. The complete transmitter has four main blocks: (a) a pulse generator, comprising two pulse generating circuit groups, to modulate and create a rectangular waveform; (b) an active balun with two amplifiers, to generate differential signals; (c) a digital demultiplexer, to alternate data to the pulse generating circuit groups; (d) a binary-to-thermometer decoder, to control the amount of generated monocycles per pulse. Simulations demonstrate an output pulse amplitude of 120 mV for the high logic level and of 70 mV for the low logic level, both at a 100 MHz Pulse Repetition Frequency. This produces a mean pulse duration of 277 ps, a mean central frequency of 3.8 GHz, and a mean power consumption 6.7 mW. The transmitter takes the form of an intellectual property core in a 130 nm CMOS technology. The complete transmitter area is 0.067 mm 2 , without I/O pads. The outcomes suggest that the proposed circuit can narrow or widen the output signal bandwidth, providing adaptability to different emission requirements.
The periodic nature of the global clock in traditional synchronous designs forces circuits to be ... more The periodic nature of the global clock in traditional synchronous designs forces circuits to be margined for the worst possible case of process, voltage, temperature, and data conditions. This constrains the silicon to operate at worst-case frequencies and at conservative supply voltages. Resilient architectures promise to remove these margins, by detecting and correcting timing errors when they occur, thereby creating the potential to achieve real average-case operation. However, synchronous resilient schemes previously proposed can suffer from multiple issues, including being susceptible to metastability and requiring often complex changes to the architecture to support replay-based recovery from timing errors. These problems respectively lead to circuit failures and/or incur high timing penalties when errors occur. This paper reviews a recently proposed asynchronous bundled-data resilient template called Blade that is robust to metastability issues, requires no replay-based logic, and has low timing error penalties. It also describes some open issues and new research opportunities this template presents, including automation problems to target average-case operation, specific circuit optimizations to minimize resiliency overhead, and the need for new test procedures to tune delay lines and screen out bad chips.
This paper proposes an adaptive pulse generator using Pulse Amplitude Modulation (PAM). The circu... more This paper proposes an adaptive pulse generator using Pulse Amplitude Modulation (PAM). The circuit was implemented with eight Pulse Generator Units (PGUs) to produce up to eight monocycles per pulse. The number of monocycles per pulse is inversely proportional to the Power Spectrum Density (PSD) bandwidth in the Impulse Radio Ultra-Wide Band (IR-UWB). The complete circuit contains two pulse generator blocks, each one composed by eight PGUs to build a rectangular waveform at the output. The PGU has been implemented with Edge Combiners High (ECH) and Edge Combiners Low (ECL) to encode the information. Each Edge Combiner has a high impedance circuit that is selected by digital control signals. The circuit has been simulated, showing an output pulse amplitude of ≈70mV for the high logic level and an amplitude of ≈35mV for the low logic level, both at 100 MHz Pulse Repetition Frequency (PRF). This produces a mean pulse duration of ≈270ps, a mean central frequency of ≈3.7GHz and a power consumption less than 0,22µW. The pulse generator block occupies an area of 0.54mm 2 .
Symposium on Integrated Circuits and Systems Design, Aug 1, 2016
Static timing analysis (STA) is a widely used technique to perform timing verification of digital... more Static timing analysis (STA) is a widely used technique to perform timing verification of digital circuits analytically. These models are available in standard cell libraries, and are usually generated based on data acquired from timing characterization performed by foundries. Even though advanced node libraries are characterized for some corners, they currently do not cover voltage levels that will likely drive ultra-low power applications as those present in domains like IoT and wearable devices, a trend for upcoming years. This work contributes to solve this issue, by proposing a characterization flow that can be applied to any standard cell set. The flow is foundry-independent and relies solely on information that can be obtained from any cell library. It employs commercial tools incremented by a set of in-house scripts. As a case study, the article explores the characterization of a commercial standard cell library for a 28nm FDSOI technology. The analysis shows that this library can be characterized for voltages as low as 250mV and still guarantee that its cells, or a subset of these, work as intended by the technology rules. A set of experiments show that the flow obtains characterization results that match those of the associated commercial library within 5% of error, for those voltages where the latter provides information.
Journal of Communication and Information Systems, Aug 30, 2005
This paper describes the design and prototyping of EMS, a telecommunication intellectual property... more This paper describes the design and prototyping of EMS, a telecommunication intellectual property soft-core developed in the scope of industry-academia cooperation. EMS performs insertion (mapping) and extraction (demapping) of EI channels into/from Synchronous Digital Hierarchy (SDH) frames. The basic SDH frame is transmitted in 155.52 Mbps rate, allowing to pack up to sixty-three 2.048 Mbps El channels. El channels belong to the Plesiochronous Digital Hierarchy (PDH). The paper addresses the solution of several synchronization problems implied by the El channels mapping/demapping process. EMS was fully described in RTL VHDL. It was functionally validated by simulation and prototyped in FPGA platforms. Together with the exploration of the techniques involved in embedding PDH into SDH frames, another contribution of the work is the availability of a reusable and parameterizable telecom core with high performance, low latency, and small size. Keywords-SDH, EI, SDH-EI mapping/demapping, soft IP core. Resumo-Este trabalho descreve 0 projeto e prototipa~ao do nucleo de propriedade intelectual para telecomunica~6es chamado EMS, que foi desenvolvido no escopo de uma coopera~ao Universidade-Empresa. EMS realiza a inser~ao/remo~ao de canais El ern/de quadros SDH (Synchronous Digital Hierarchy). 0 quadro basico SDH e transmitido em uma taxa de 155.52 Mbps, perrnitindo empacotar ate 63 canais El (2.048 Mbps). Os canais El pertencem a hierarquia digital plesiocrona (PDH). 0 trabalho trata da solu~ao de diversos problemas gerados pelo processo de inser~ao/remo~ao de canais El. EMS foi completamente descrito em RTL VHDL, foi validado funcionalmente por simula~ao e prototipado em plataformas FPGAs. Junto com a explora~ao de tecnicas que envolvem o empacotamento de PDH em quadros SDH, outra contribui~ao deste trabalho esta na disponibilidade de reutilizar e parametrizar urn nucleo de telecomunica~ao com alto desempenho, baixa latencia e tamanho pequeno. This work was supported by PUCRS University and Parks SA in the scope of an industry-academia cooperation. Authors I and 2 are PhD. students of Federal University of Rio Grande do SuI (UFRGS)-Porto Alegre-Rio Grande do Sui, Brazil-emails:{marcon.jcspalma}@inf.ufrgs.br. Authors 3 and 4 are professors of Pontifical Catholic University of Rio Grande do Sui (PUCRS)-Porto Alegre-Rio Grande do Sui, Brazil-emails{calazans.moraes}@inf.pucrs.br. Editor, networking area for Revista SBrT: Eduardo W. Bergamini.
This work presents ARV, an asynchronous superscalar organisation for the RISC-V architecture. As ... more This work presents ARV, an asynchronous superscalar organisation for the RISC-V architecture. As far as the authors could verify, this is the first proposal of an asynchronous version for this recent open source processor architecture. The organisation is modelled using Google's Go as a high level hardware description language. Go has proved adequate to model the refined handshake structures present in the asynchronous design of complex super-scalar structures. Preliminary performance data obtained using the Go model enabled a detailed evaluation of the organisation, providing design exploration of several points to further improve the organisation before committing to its implementation at lower abstraction levels.
O trabalho objetiva desenvolver um compilador cruzado para a linguagem de processamento paralelo ... more O trabalho objetiva desenvolver um compilador cruzado para a linguagem de processamento paralelo occam2 para o processador TMS320C40 da Texas Instruments, com isso disponibilizar uma linguagem originalmente baseada em paradigmas de processamento paralelo para arquiteturas de processamento de sinal digital. Este trabalho motiva-se na cooperação da universidade de Kent-UK, onde foi desenvolvida uma ferramenta similar (compilador occam2 para estações Sparc). Entre os trabalhos já desenvolvidos está a obtenção do Código fonte do Kroc, estudo do código, instalação nas máquinas Sun e identificação dos módulos a serem modificados. Estando em andamento a tradução do compilador, estando essa dividida em três módulos : o tradutor de código fonte occam2 em linguagem assembly para o TMS320C40, o escalonador, que é responsável pelo concorrência entre processos, e o ligador, cujo faz a adição do resultado da tradução e com o escalonador. (CNPq)
Multi-processor Systems-on-Chip (MPSoCs) have been proposed to tackle embedded systems' requireme... more Multi-processor Systems-on-Chip (MPSoCs) have been proposed to tackle embedded systems' requirements due to their potential for low-power consumption and high scalability. These systems fit the needs of many application domains, including robotics and autonomous vehicles, in which reliability, performance, and timeliness are critical to operation. In this paper, we propose an integrated environment for the development of robotic applications targeting MPSoCs. The proposed environment eases the evaluation of non-functional requirements by combining cycle-accurate simulations from RTL models with behavioral simulations from robotics. We present a case study of the proposed environment in the context of a UAV (unmanned aerial vehicle) stabilization software, providing performance and energy estimations for different software implementations.
Planning and implementing a semiconductor integrated circuit is a highly complex process. Althoug... more Planning and implementing a semiconductor integrated circuit is a highly complex process. Although physical limits seem to be approaching, it currently follows a growing evolutionary path. As deep submicron technologies evolve towards perhaps even sub-nano geometries, the design process complicates accordingly. Once subtle in higher geometry nodes, some effects become relevant or even dominant. Examples are effects that tamper the reliability of wires, such as crosstalk, or the adequate behaviour of gates, such as the increasing sensitivity to single event effects. Design techniques must thus also evolve, to provide a wide range of tools to deal with new effects during the integrated circuit design and test processes. This tutorial covers one set of design techniques that is often overlooked, but which can reveal themselves instrumental in dealing with the mentioned technology evolution, the use of clockless or asynchronous circuits. The tutorial is divided into three parts: first i...
2019 IEEE 10th Latin American Symposium on Circuits & Systems (LASCAS)
Random number generators find application in many fields, including cryptography, digital signatu... more Random number generators find application in many fields, including cryptography, digital signatures and network equipment testers, to cite a few. Two main classes of such generators are usually proposed, pseudo-random number generators and truerandom number generators. The former are simple to build and use, but cannot be employed in every application, especially in those where randomness is meant to support security. The later can be complicated to build, since they often must rely on hard-to-predict events that are hard to produce in the deterministic world of digital circuits. This work proposes a quasi-random number generator hardware implementation, intended to provide most of the benefits of true-random number generators with costs closer to those of pseudorandom number generators. The quasi-random number generator described here relies on the use of asynchronous circuit design techniques allied to process, voltage and temperature variability to achieve relatively high degrees of randomness. An FPGA prototype demonstrates the feasibility of the approach.
2017 24th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2017
Network test needs grow as fast as network performance. These must be reliable, remain available ... more Network test needs grow as fast as network performance. These must be reliable, remain available without interruption, and secure user data. Thus, they must continually be tested for service quality, based on measurements like throughput, latency, and jitter. Network and network equipment tests can rely on test software, dedicated test hardware, or in FPGA platforms. The latter offer a trade-off between software solutions limited bandwidth and high performance dedicated test equipment. This work proposes a new, industrial grade, low-cost and accurate network test solution, XGT4. XGT4 employs the NetFPGA SUME board and provides full support to the RFC2544 test benchmarks for 1G and 10G Ethernet equipment, a feature not available in other open source testers. Besides, XGT4 can be remotely operated through a dedicated web interface, and deals with multiple simultaneous 10GbE streams. Results of using XGT4 to test a commercial DUT report realworld results for all RFC2544 tests: throughput, latency, back-to-back, frame loss rate, system recovery and reset. An additional feature is timed throughput runs, to conduct bit error rate tests. The XGT4 hardware has a small area footprint, taking less than 20% of the platform FPGA.
Side channel attacks (SCA) are known to be efficient techniques to retrieve secret data. In this ... more Side channel attacks (SCA) are known to be efficient techniques to retrieve secret data. In this context, this paper concerns the evaluation of the robustness of secure triple track logic (STTL) against power and electromagnetic analyses on FPGA devices. More precisely, it aims at demonstrating that the basic concepts behind STTL are valid in general and particularly for FPGAs. Also, the paper shows that this new logic may provide interesting design guidelines to get circuits that are resistant to differential power analysis (DPA) attacks which and also more robust against differential electromagnetic attacks (DEMA).
No presente trabalho é apresentada uma arquitetura para acelerar o ciclo do projeto de sistemas d... more No presente trabalho é apresentada uma arquitetura para acelerar o ciclo do projeto de sistemas digitais na etapa de prototipação. A ferramenta em questão consiste em uma placa para prototipação (comercializada pela empresa inglesa Sundance) composta por um dispositivo lógico programável (FPGA), um processador Transputer, memórias e compiladores (para geração dos arquivos de configuração do FPGA e código a ser executado pelo Transputer). Na implementação dois enfoques são abordados: sistemas descritos totalmente em hardware; e sistemas descritos parte em hardware e parte em software. Os sistemas são descritos na forma de processos concorrentes e comunicantes, sendo que os processos de hardware são implementados em Handel-C (FPGA) e os processos de software são implementados em occam (Transputer). Com a utilização da placa de prototipação é possível avaliar quais processos devem ser executados em hardware (tempo de execução crítico) e quais devem ser executados em software. Esse trabalho está sendo desenvolvido no âmbito do projeto PISH (PROTEM-CNPq), do qual participam a UFPE, UFRGS e PUCRS.
No âmbito do projeto PISH (Projeto Integrado de Software e Hardware) faz-se necessária a implemen... more No âmbito do projeto PISH (Projeto Integrado de Software e Hardware) faz-se necessária a implementação de uma ferramenta para tradução de VHDL (linguagem utilizada para descrição de hardware) em Handel-C. O PISH é um projeto PROTEM que envolve a UFPE, UFRGS e PUCRS. O objetivo principal do PISH é pesquisar e propor métodos de automatização do processo de projeto de sistemas de hardware e software (hardware/software codesign), cujo desenvolvimento será realizado a partir de uma descrição funcional com alto nível de abstração. A descrição em VHDL, cuja elaboração está sob responsabilidade da UFRGS, encontra-se em alto nível de abstração numa descrição de domínio estrutural, sendo utilizada na representação do bloco operacional de um sistema digital. A linguagem Handel-C, utilizada pelo grupo de prototipação da PUCRS na geração de configurações para FPGAs, é uma linguagem para descrição de hardware baseada nas linguagens C e occam (sendo occam uma linguagem concebida a partir do modelo CSP). As principais etapas do trabalho são: estudo do VHDL para definição de um subconjunto a ser utilizado; estudo do Handel-C; e implementação do tradutor. Esta ferramenta está sendo implementada a partir da definição de uma gramática, com auxílio da ferramenta Lex & Yacc. O tradutor está sendo desenvolvido em estações de trabalho SUN.
Uploads
Papers by Ney Laert L V Vilar Calazans