V. SUMMARY We have given parallel algorithms for recognizing and parsing context-free languages on a hypercube of p PE's, 1 5 p 5 n. The algorithms are both time-wise and space-wise optimal with respect to the most efficient... more
V. SUMMARY We have given parallel algorithms for recognizing and parsing context-free languages on a hypercube of p PE's, 1 5 p 5 n. The algorithms are both time-wise and space-wise optimal with respect to the most efficient sequential algorithm. The recognition algorithms were ...
This paper provides a comparison between two automatic systolic array design methods: the so called space-time transformation methodology (a unifying approach to the design of VLSI algorithms (14), (15)), and a functional-based design... more
This paper provides a comparison between two automatic systolic array design methods: the so called space-time transformation methodology (a unifying approach to the design of VLSI algorithms (14), (15)), and a functional-based design method (see (6), (9), (10)). The advantages (and possible disadvantages) of each method are pointed out by representative case studies (variants of systolic arrays generated with both
Csp-Prover provides a deep encoding of the process al- gebra Csp in the interactive theorem prover Isabelle. Here, we extend Csp-Prover by a framework for the deadlock-analysis of networks. As a typical example we study systolic arrays... more
Csp-Prover provides a deep encoding of the process al- gebra Csp in the interactive theorem prover Isabelle. Here, we extend Csp-Prover by a framework for the deadlock-analysis of networks. As a typical example we study systolic arrays and prove in Csp-Prover that Kung's classical algorithm for matrix-multiplication is deadlock-free.
In this paper we propose a novel two-dimensional clocking and timing scheme for systems which permit a reduction in the longest line length in each clocking zone. The proposed clocking schemes utilize logic propagation techniques which... more
In this paper we propose a novel two-dimensional clocking and timing scheme for systems which permit a reduction in the longest line length in each clocking zone. The proposed clocking schemes utilize logic propagation techniques which have been developed for systolic arrays. Placement of QCA cells is modified to ensure correct signal generation and timing. The significant reduction in the longest line length permits a fast timing and efficient pipelining to occur, while guaranteeing kink-free behavior in switching.
This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing... more
This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability. Keywords: Bio-inspired computing Distributed arithmetic Finite impulse response MAC and parallel filters Processor architecture Systolic array This is an open access article under the CC BY-SA license.
A new VLSI algorithm and its associated systolic array architecture for a prime length type IV discrete cosine transform is presented. They represent the basis of an efficient design approach for deriving a linear systolic array... more
A new VLSI algorithm and its associated systolic array architecture for a prime length type IV discrete cosine transform is presented. They represent the basis of an efficient design approach for deriving a linear systolic array architecture for type IV DCT. The proposed algorithm uses a regular computational structure called pseudoband correlation structure that is appropriate for a VLSI implementation. The proposed algorithm is then mapped onto a linear systolic array with a small number of I/O channels and low I/O bandwidth. The proposed architecture can be unified with that obtained for type IV DST due to a similar kernel. A highly efficient VLSI chip can be thus obtained with good performance in the architectural topology, computing parallelism, processing speed, hardware complexity and I/O costs similar to those obtained for circular correlation and cyclic convolution computational structures.
Eliminating cryptographic computation errors is vital for preventing attacks. A simple approach is to verify the correctness of the cipher before outputting it. The multiplication is the most significant arithmetic operation among the... more
Eliminating cryptographic computation errors is vital for preventing attacks. A simple approach is to verify the correctness of the cipher before outputting it. The multiplication is the most significant arithmetic operation among the cryptographic computations. Hence, a multiplier with concurrent error detection ability is urgently necessary to avert attacks. Employing the re-computing shifted operand concept, this study presents a semi-systolic array polynomial basis multiplier with concurrent error detection with minimal area overhead. Moreover, the proposed multiplier requires only two extra clock cycles while traditional multipliers using XOR trees consume at least \(\left\lceil {\log _2 m} \right\rceil\) extra XOR gate delays in GF(2m ) fields.
Biological sequence alignment is becoming popular and interesting field to researchers especially the Bioinformatists. Two sequences with similar or varying lengths can be aligned using any alignment algorithm like Smith-Waterman... more
Biological sequence alignment is becoming popular and interesting field to researchers especially the Bioinformatists. Two sequences with similar or varying lengths can be aligned using any alignment algorithm like Smith-Waterman Algorithm (SWA), Needleman-Wunsch Algorithm (NWA), BLAST, FASTA etc. Some of these algorithms are fast but lack accuracy (like FASTA and BLAST) while some are accurate at the expense of time (like SWA). SWA uses a dynamic programming approach with time and space constraint. Various methods (like systolic array method, implementation of the algorithm on FPGA etc.) have been applied on algorithm by various researchers to reduce or eliminate the computational complexity. This paper focuses on Recursive Variable Expansion (RVE) using parallel approach on the algorithm of Smith-Waterman to tackle the problem of time constraints in the algorithm and compare the result with another researcher's work.