Abstract
This paper presents the design and implementation of an efficient reconfigurable parallel prefix computation hardware on field-programmable gate arrays (FPGAs). The design is based on a pipelined dataflow algorithm, and control logic is added to reconfigure the system for arbitrary parallelism degree. The system receives multiple input streams of elements in parallel and produces output streams in parallel. It has an advantage of controlling the degree of parallelism explicitly at run time. The time complexity of the design is O(d+(N−d)/d), where d and N are parallelism degree and stream size, respectively. When the stream size is sufficiently larger than the initial trigger time of the pipeline (d), the time complexity becomes O(N/d). Unlike the prefix computation circuits found in the literature, the design is scalable for different problem sizes including unknown sized data. The design is modular based on a finite state machine, and implemented and tested for target FPGA devices Xilinx Spartan2S XC2S300EFT256-6Q and XC2S600EFG676-6.
Similar content being viewed by others
References
Agarwal RK (1992) Computational fluid dynamics on parallel processors, tutorial. McDonnell Douglas Research Laboratories. In: Proc of the 6th ACM SigArch int conference on supercomputing, Washington, DC, USA, July 1992
Akl SG (1997) Parallel computation: models and methods. Prentice-Hall, New York
Almasi G, Gottlieb A (1989) Highly parallel computing. Benjamin/Cummings, New York, Chapter 4
Beaumont-Smith A, Lim C (2001) Parallel prefix adder design. In: Proc of the 15th IEEE symposium on computer arithmetic, Vail, Colorado, USA, June 2001, pp 218–225
Bilgory A, Gajski D (1986) A heuristic for suffix solutions. IEEE Trans Comput 35(1)
Cole R, Vishkin U (1989) Faster optimal parallel prefix sums and list ranking. Inf Control 81:334–352
Court TV, Herbordt MC (2004) Families of FPGA-based algorithms for approximate string matching. In: Proc of the 15th IEEE int conference on application-specific systems, architectures and processors, pp 354–364
Dai HK, Su HC (2006) A parallel algorithm for finding all successive minimal maximum subsequences. In: Proceedings of LATIN 2006: theoretical informatics: 7th Latin American symposium. Valdivia, Chile, March 2006. Lecture notes in computer science, vol 3887. Springer, New York, pp 337–348
Dimitrakopoulos G, Nikolos D (2005) High-speed parallel-prefix VLSI ling adders. IEEE Trans Comput 54(2):225–231
Fich FE (1983) New bounds for parallel prefix circuits. In: Proc of the 15th annual ACM symposium on theory of computing, pp 100–109
Ha S, Lee EA (1997) Compile-time scheduling of dynamic constructs in dataflow program graphs. IEEE Trans Comput 46:768–778
Hadjicostis CN (2004) Coding techniques for fault-tolerant parallel prefix computations in Abelian groups. Comput J 47(3):329–341
Hagerup T (1995) The parallel complexity of integer prefix summation. Inf Process Lett 56:59–64
Helman DR, Jaja J (1999) Prefix computations on symmetric multiprocessors. In: Proc of the 13th int parallel processing symp and 10th symp on parallel and distributed processing, San Juan, Puerto Rico, April 1999
Jana PK, Naidu BD et al.(2002) Parallel prefix computation on extended multimesh network. Inf Process Lett 84(6):295–303
Johnsonbaugh R, Schaefer M (2004) Algorithms. Pearson/Prentice-Hall, New York
Kamakoti V, Balakrishnan N (1997) Efficient algorithms for prefix and general prefix computations on distributed shared memory systems with applications. In: Proc of the 1999 int conference on parallel and distributed systems, Seoul, Korea, Dec 1997, pp 44–51
Khan J, Rajagopalan J et al. (2004) A portable face recognition system using reconfigurable hardware. In: Proc of the 2004 int conference on engineering of reconfigurable systems and algorithms, Las Vegas, USA, June 2004
Ladner R (1980) M Fischer. Parallel prefix computation, J Assoc Comput Mach 27(4):831–838
Lakshmivarahan S, Dhall SK (1994) Parallel computing using the prefix problem. Oxford University Press, Oxford
Lin Y-C, Chen J-N (2003) Z4: A new depth-size optimal parallel prefix circuits with small depth. Neural Parallel Sci Comput 11(3):221–236
Lin Y, Hsiao J (2004) A new approach to constructing optimal parallel prefix circuits with small depth. J Parallel Distrib Comput 64(1):97–107
Lin YC, Lin CM (1996) Efficient parallel prefix algorithms on fully connected message passing computers. In: Proc of the 3rd int conference on high performance computing, Trivandrum, India, Dec 1996
Miller R, Boxer L (2000) Algorithms, sequential & parallel, a unified approach. Prentice-Hall, New York
Murty VS, Reghu Raj PC, Raman S (2003) Design of a high speed string matching co-processor for NLP. In: Proc of the 16th int conference on VLSI design, pp 183–188
Parhami B (2002) Introduction to parallel processing: algorithms and architectures. Springer, Berlin
Park JH (2000) An efficient hardware algorithm for parallel prefix computation with resource constraints. In: Proc of the 2000 int conference on parallel & distributed processing tech and applications, Las Vegas, USA, June 2000
Park JH (2005) Reconfigurable parallel approximate string matching on FPGAs. In: Proc of the 8th EUROMICRO conference on digital system design, Porto, Portugal, Aug 2005, pp 214–217
Park JH, George KM (1996) Parallel history sensitive computations in dataflow architecture. In: Proc of the IEEE second international conference on algorithms & architectures for parallel processing, Singapore, June 1996, pp 522–529
Ragde P (1993) The parallel simplicity of compaction and chaining. J Algorithms 14:371–380
Rajasekaran S, Reif JH (1989) Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J Comput 18:594–607
Roch J-L, Traore D, Bernard J (2006) On-line adaptive parallel prefix computation. In: Proc of the 12th int Europar conference, Dresden, Germany, Aug 2006
Scrofano R, Prasanna VK (2004) Computing Lennard-Jones potentials and forces with reconfigurable hardware. In: Proc of the 2004 int conference on engineering of reconfigurable systems and algorithms, Las Vegas, USA, June 2004
Wang H, Nicolau A (1996) The strict time lower bound and optimal schedules for parallel prefix with resource constraints. IEEE Trans Comput 45(11):1257–1271
Xilinx Inc, http://www.xilinx.com
Zhu H, Cheng C-K, Graham R (2006) On the construction of zero-deficiency parallel prefix circuits with minimum depth. ACM Trans Des Automat Electron Syst 11(2):387–409
Zhuo L, Prasanna VK (2004) Scalable and modular algorithms for floating-point matrix multiplication on FPGAs. In: Proc of the 18th int parallel & distributed processing symposium, New Mexico, USA, April 2004
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, J.H., Dai, H.K. Reconfigurable hardware solution to parallel prefix computation. J Supercomput 43, 43–58 (2008). https://doi.org/10.1007/s11227-007-0137-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-007-0137-1