Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/786452.786684guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Kestrel: Design of an 8-bit SIMD Parallel Processor

Published: 15 September 1997 Publication History

Abstract

Kestrel is a high-performance programmable parallel co-processor. Its design is the result of examination and reexamination of algorithmic, architectural, packaging, and silicon design issues, and the interrelations between them. The final system features a linear array of 8-bit processing elements, each with local memory, an arithmetic logic unit (ALU), a multiplier, and other functional units. Sixty-four Kestrel processing elements fit in a 1.4 million transistor, 60 mm^2, 0.5 micron CMOS chip with just 84 pins. The planned single-board, 8-chip system will, for some applications, provide supercomputer performance at a fraction of the cost. This paper surveys four of our applications (sequence analysis, neural networks, image compression, and floating-point arithmetic), and discusses the philosophy behind many of the design decisions. We present the processing element and system architectures, emphasizing the ALU and comparator's compact instruction encoding and design, the architecture's facility with nested conditionals, and the multiplier's flexibility in performing multiprecision operations. Finally, we discuss the implementation and performance of the Kestrel test chips.

References

[1]
D. Benson, M. Boguski, D. Lipman, and J. Ostell, "Genbank," Nucleic Acids Research, vol. 25, no. 1, pp. 1-6, 1997.
[2]
D. W. Blevins et al., "BLITZEN: A highly integrated massively parallel machine," J. Parallel and Distributed Computing, vol. 8, pp. 150-160, Feb. 1990.
[3]
M. Borah, C. Nagendra, R. Owens, and M. J. Irwin, "The MGAP: A high-performance, user-programmable, multifunctional architecture for DSP," in Proc. Hawaii Int. Conf. System Sciences, pp. 96-104, IEEE, 1994.
[4]
D. Brutlag, J.-P. Deautricourt, and J. Griffin. Personal Communication, Oct. 1995.
[5]
E. Chow, T. Hunkapiller, J. Peterson, and M. S. Waterman, "Biological information signal processor," in Proc. Int. Conf. ASAP (M. Valero et al., eds.), (Los Alamitos, CA), pp. 144- 160, IEEE CS, Sept. 1991.
[6]
N. H. Christ and A. E. Terrano, "A micro-based supercomputer," Byte, pp. 145-160, Apr. 1986.
[7]
Compugen Ltd., "Biocellerator information package." Obtained from [email protected], 1994.
[8]
C. Ebeling, D. C. Conquist, and P. Franklin, RaPiD - Reconfigurable Pipelined Datapath, pp. 126-135. New York: Springer-Verlag, 1996.
[9]
P. Faudemay and L. Winckel, "An abstract model for a low cost SIMD architecture," in Proc. Int. Conf. ASAP, (Los Alamitos, CA), pp. 145-154, IEEE CS, July 1996.
[10]
A. Gentile et al., "Real-time implementation of full-search vector quantization on a low memory SIMD architecture," in IEEE Data Compression Conference, p. 438, Mar. 1996.
[11]
M. Gokhale et al., "Building and using a highly parallel programmable logic array," Computer, vol. 24, pp. 81-89, Jan. 1991.
[12]
M. Gokhale et al., "Processing in memory: The Terasys massively parallel PIM array," Computer, vol. 28, pp. 23-31, Apr. 1995.
[13]
J. A. Grice, R. Hughey, and D. Speck, "Reduced space sequence alignment," CABIOS, vol. 13, no. 1, pp. 45-53, 1997.
[14]
D. W. Hammerstrom and D. P. Lulich, "Image processing using one-dimensional processor arrays," Proc. IEEE, vol. 84, no. 7, pp. 1005-1018, 1996.
[15]
R. Hughey, "Parallel sequence comparison and alignment," CABIOS, vol. 12, no. 6, pp. 473- 479, 1996.
[16]
R. Hughey and D. P. Lopresti, "B-SYS: A 470-processor programmable systolic array," in Proc. Int. Conf. Parallel Processing (C. Wu, ed.), vol. 1, (Boca Raton, FL), pp. 580-583, CRC Press, Aug. 1991.
[17]
Intel Corporation, http://developer.intel.com/drg/mmx/manuals/prm/prm.htm, MMX Technology Developer's Programmer's Reference Manual, 1997.
[18]
ISO/IEC, "IS-10918: Compression and coding of continuous-tine still images."
[19]
N. Jagadish, J. M. Kumar, and L. M. Patnaik, "An efficient scheme for interprocessor communication using dual-ported RAMs," IEEE Micro, pp. 10-19, Oct. 1989.
[20]
T. P. Kelliher, E. S. Gayles, R. M. Owens, and M. J. Irwin, "The MGAP-2: An advanced, massively parallel VLSI signal processor," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, vol. 5, pp. 3219-22, IEEE, May 1995.
[21]
H.-N. Kim, M. Borah, R. M. Owens, and M. J. Irwin, "2-D discrete cosine transforms on a fine grain array processor," in Proc. VLSI Signal Processing VII, pp. 356-367, IEEE, Oct. 1994.
[22]
D. E. Knuth, The Art of Computer Programming, vol. 2. Reading, MA: Addison-Wesley, 2nd ed., 1981.
[23]
A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler, "Hidden Markov models in computational biology: Applications to protein modeling," J. Mol. Biol., vol. 235, pp. 1501- 1531, Feb. 1994.
[24]
D. Lavenier, "SAMBA: Systolic accelerators for molecular biological applications," Tech. Rep. 988, IRISA, 35042 Rennes Cedex, France, Mar. 1996.
[25]
C. Lindsey and T. Lindblad, "Review of hardware neural networks: a user's perspective," in Proceedings of the Second Workshop on Neural Networks, Elba International Physics Center, 1994. Updated version (with Bruce Denby) at http://www1.cern.ch/NeuralNets/ nnwInHepHard.html.
[26]
R. N. Mayo et al., "1990 DECWRL/Livermore Magic Release," Research Report 90/7, Digital Western Research Laboratory, Palo Alto, CA, Sept. 1990.
[27]
C. A. Mead and L. A. Conway, Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1980.
[28]
K. Mehrotra et al., Elements of Artificial Neural Networks. Cambridge, MA: MIT, 1997.
[29]
S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequences of two proteins," J. Mol. Biol., vol. 48, pp. 443-453, 1970.
[30]
J. R. Nickolls, "The design of the Maspar MP-1: A cost effective massively parallel computer," in Proc. COMPCON Spring 1990, (Los Alamitos, CA), pp. 25-28, IEEE Computer Society Press, Feb. 1990.
[31]
W. R. Pearson, "Personal communication," 1995.
[32]
E. Rice and R. Hughey, "Multiprecision division on an 8-bit processor," in Proc. 13th IEEE Symp. Computer Arithmetic, IEEE CS, July 1997.
[33]
J. D. Roberts, MISC: A parallel architecture for AI. PhD thesis, University California, Santa Cruz, CA, 1995.
[34]
L. Roberts, "New chip may speed genome analysis," Science, vol 244, pp. 655-6, 12 May 1989.
[35]
P. H. Sellers, "On the theory and computation of evolutionary distances," SIAM J. Appl. Math., vol. 26, pp. 787-793, 1974.
[36]
R. Singh et al., "A scalable systolic multiprocessor system for biosequence similarity analysis," in Symp. Integrated Systems (L. Snyder, ed.), pp. 169-181, Cambridge, MA: MIT Press, Apr. 1993.
[37]
T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences," J. Mol. Biol., vol. 147, pp. 195-197, 1981.
[38]
Telenor Research and Development, "H.263 software version 2.0," 1996. Available from http://www.fou.telenor.no/brukere/DVC/h263_software/.
[39]
T. A. Thanaraj and T. Flores, "Assessment of Smith-Waterman sequence search tools implemented in Biocellerator, FDF, and MasPar," tech. rep., European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Feb. 1997. Available from http://industry.ebi.ac.uk/~thanaraj/seqassess/report.html.
[40]
Time Logic Inc., "Decypher II product literature." Incline Village, NV, http://www.timelogic.com, 1996.
[41]
J. E. Vuillemin, P. Bertin, D. Roncin, M. Shand, et al., "Programmable active memories: reconfigurable systems come of age," IEEE Trans. VLSI Systems, vol. 4, no. 1, pp. 56-69, 1996.

Cited By

View all
  • (2005)The UCSC Kestrel Parallel ProcessorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2005.1216:1(80-92)Online publication date: 1-Jan-2005
  • (2005)Optimizing neural networks on SIMD parallel computersParallel Computing10.1016/j.parco.2004.11.00231:1(97-115)Online publication date: 1-Jan-2005

Index Terms

  1. Kestrel: Design of an 8-bit SIMD Parallel Processor
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ARVLSI '97: Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
      September 1997
      ISBN:0818679131

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 15 September 1997

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2005)The UCSC Kestrel Parallel ProcessorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2005.1216:1(80-92)Online publication date: 1-Jan-2005
      • (2005)Optimizing neural networks on SIMD parallel computersParallel Computing10.1016/j.parco.2004.11.00231:1(97-115)Online publication date: 1-Jan-2005

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media