Block Recombination Approach for Subquadratic Space Complexity Binary Field Multiplication Based on Toeplitz Matrix-Vector Product
In this paper, we present a new method for parallel binary finite field multiplication which results in subquadratic space complexity. The method is based on decomposing the building blocks of the Fan-Hasan subquadratic Toeplitz matrix-vector ...
High-Speed Architectures for Multiplication Using Reordered Normal Basis
Normal basis has been widely used for the representation of binary field elements mainly due to its low-cost squaring operation. Optimal normal basis type II is a special class of normal basis exhibiting very low multiplication complexity and is ...
On Modulo 2^n+1 Adder Design
Two architectures for modulo 2^n+1 adders are introduced in this paper. The first one is built around a sparse carry computation unit that computes only some of the carries of the modulo 2^n+1 addition. This sparse approach is enabled by the ...
An Online Failure Detection Method for Data Buses Using Multithreshold Receiving Logic
Random voltage changes on bus lines may lead to reading corrupted digital data. A novel methodology for detecting such anomalies is proposed. It uses multiple threshold voltages at the receiver end. In earlier work it was shown that multiple voltage ...
NUDA: A Non-Uniform Debugging Architecture and Nonintrusive Race Detection for Many-Core Systems
Traditional debugging methodologies are limited in their ability to provide debugging support for many-core parallel programming. Synchronization problems or bugs due to race conditions are particularly difficult to detect with existing debugging tools. ...
An Efficient TCAM-Based Implementation of Multipattern Matching Using Covered State Encoding
This paper proposes a state encoding scheme called a covered state encoding for the efficient TCAM-based implementation of the Aho-Corasick multipattern matching algorithm, which is widely used in network intrusion detection systems. Since the ...
An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs
Most of today's state-of-the-art processors for mobile and embedded systems feature on-chip scratchpad memories. To efficiently exploit the advantages of low-latency high-bandwidth memory modules in the hierarchy, there is the need for programming ...
DMA++: On the Fly Data Realignment for On-Chip Memories
Multimedia extensions based on Single-Instruction Multiple-Data (SIMD) units are widespread. They have been used, for some time, in processors and accelerators (e.g., the Cell SPEs). SIMD units usually have significant memory alignment constraints in ...
CPU Accounting for Multicore Processors
In single-threaded processors and Symmetric Multiprocessors the execution time of a task depends on the other tasks it runs with (the workload), since the Operating System (OS) time shares the CPU(s) between tasks in the workload. However, the time ...
Bounded Relay Hop Mobile Data Gathering in Wireless Sensor Networks
Recent study reveals that great benefit can be achieved for data gathering in wireless sensor networks by employing mobile collectors that gather data via short-range communications. To pursue maximum energy saving at sensor nodes, intuitively, a mobile ...
Pancyclicity of Matching Composition Networks under the Conditional Fault Model
A graph $G=(V,E)$ is said to be \emph{conditional k-edge-fault pancyclic} if, after removing $k$ faulty edges from $G$ and provided that each node is incident to at least two fault-free edges, the resulting graph contains a cycle of every length from ...
FFT Implementation with Fused Floating-Point Operations
This paper describes two fused floating-point operations and applies them to the implementation of fast Fourier transform (FFT) processors. The fused operations are a two-term dot product and an add-subtract unit. The FFT processors use "butterfly” ...