Benchmark Synthesis Using the LRU Cache Hit Function
The LRU cache hit function is used as a general characterization of locality of reference to address the synthesis question of whether benchmarks can be created that have a required locality of reference. Several results are given that show ...
A Generalized Message-Passing Mechanism for Communicating Sequential Processes
Bidirectional message-passing (bi-io), a novel symmetric communication mechanism for concurrent processes, is introduced and developed. The mechanism is symmetric in the sense that, in one atomic action, a message is transmitted in each direction ...
Continuous Models for Communication Density Constraints on Multiprocessor Performance
Fundamental limits on the communication capabilities of massively parallel multiprocessors are investigated. It is shown that in the limit of machines of infinite extent in which the number of processors per unit volume is constant and in which the ...
Systolic Super Summation
A principal limitation in accuracy for scientific computation performed with floating-point arithmetic is due to the computation of repeated sums, such as those that arise in inner products. A systolic super summer of cellular design is proposed for the ...
Theory of Clocking for Maximum Execution Overlap of High-Speed Digital Systems
The effect of clocking schemes on overlapped execution performance in a digital system is described and quantified. Effects of branching, data dependencies, and resource conflicts between consecutive tasks are considered. Some problems of clocking ...
A Synthesis Algorithm for Reconfigurable Interconnection Networks
The performance of a parallel algorithm depends in part on the interconnection topology of the target parallel system. An interconnection network is called reconfigurable if its topology can be changed between different algorithm executions. Since ...
Cache Operations by MRU Change
The performance of set associative caches is analyzed. The method used is to group the cache lines into regions according to their positions in the replacement stacks of a cache, and then to observe how the memory access of a CPU is distributed over ...
Abstract pecification of Synchronous Data Types for VLSI and Proving the Correctness of Systolic Network Implementations
A combined methodology is presented for specifying abstract synchronous data types and proving the correctness of systolic network implementations. It is shown that an extension of the Parnas trace method of specifying software modules containing ...
On Two-Dimensional Via Assignment for Single-Row Routing
The authors study the via assignment problem when vias are allowed to appear rowwise as well as columnwise. Previously they proved that the problem belongs to the class of NP-hard problems and therefore it is unlikely that polynomial-time algorithms ...
Systolic Tree Implementation of Data Structures
Systolic tree architectures are presented for data structures such as stacks, queues, dequeues, priority queues, and dictionary machines. The stack, queue, and dequeue have a unit response time and a unit pipeline interval. The priority queue also has a ...
A Comparison of VLSI Architecture of Finite Field Multipliers Using Dual, Normal, or Standard Bases
Three different finite-field multipliers are presented: (1) a dual-basis multiplier due to E.R. Berlekamp (1982); the Massey-Omura normal basis multiplier; and (3) the Scott-Tavares-Peppard standard basis multiplier. These algorithms are chosen because ...
Approximate Analysis of Fork/Join Synchronization in Parallel Queues
An approximation technique, called scaling approximation, is introduced and applied to the analysis of homogeneous fork/join queuing systems consisting of K or=32.
A Simple Method for Determining Hadamard Sequency Vectors
A simple method for determining the sequency ordering of any row in any Hadamard matrix directly from its binary representation is developed. This proposed method is proved to be much simpler than the well-known bit-reverse inverse Gray code method.
Definition and Design of Strongly Language Disjoint Checkers
Strongly language-disjoint (SLD) checkers are to sequential systems what strongly code-disjoint checkers are to combinatorial systems. SLD checkers are the largest class of checkers with which a functional system can achieve the totally self-checking ...
A New Bit-Serial Systolic Multiplier Over GF(2/sup m/)
A bit-serial systolic array has been developed to computer multiplications over GF(2/sup m/). In contrast to a previously designed systolic multiplier, this algorithm allows the input elements to center a linear systolic array in the same order, and the ...
Strongly Code Disjoint Checkers
Strongly code-disjoint (SCD) checkers are defined and shown to include totally self-checking (TSC) code-disjoint checkers. This type of checker is the natural companion of strongly fault-secure (SFS) networks. SCD checkers are the largest class of ...
Functional Test Generation Based on Unate Function Theory
The generation of a universal test set (UTS) for unate functions is used as a starting point. This test set is complete and minimal for the set of all unateness-preserving faults. However, for functions that are not unate in any variable, the UTS ...
Minimum Complexity FIR Filters and Sparse Systolic Arrays
The properties of B-spline approximation and the integral/derivative properties of convolution lead to efficient algorithms for the implementation of multidimensional FIR filters. The implementations are of minimum time complexity under the Nyquist ...