Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1508128.1508190acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster

Computation reuse in domain-specific optimization of signal recognition

Published: 22 February 2009 Publication History

Abstract

Domain-specific optimizations that exploit specific arithmetic and representation formats have been shown to achieve significant performance/area gains in FPGA hardware designs. In this work, we describe an approach to domain-specific optimization that goes beyond this representation level. We perform a joint optimization from a high-level mathematical abstract representation and hardware implementation point of view. We focus on a signal recognition system that distinguishes between spoken digits. We construct transform matrices from Walsh wavelet packets in conjunction with a BestBasis algorithm. The resulting transform matrices exhibit a rich algebraic structure and contain significant overlap across rows, exhibiting significant computation reuse in the dot-product operation of the transform matrix applied to the signal vector. We have developed an algorithm for identifying the computation reuse and scheduling the row computations across various computation units to significantly reduce the overall amount of computation.
We have implemented a custom-built dot-product multiplication unit targeting a Virtex-II-Pro FPGA device that exploits computation reuse. A baseline dot-product multiplication unit, without reuse, exhibits a maximum clock rate of 199.3 MHz while utilizing only 2% of the device capacity. The optimized system that exploits reuse also includes a computation scheduler and attains a respectable clock rate of 196 MHz while using 8,183 (57%) slices of the FPGA device. The FPGA hardware implementation reduces the amount of computation for an individual matrix by as much as 6.35× and an average of 2× for a single pipelined dot-product unit over the baseline implementation. Although it is larger in area than the baseline, the implementation that exploits reuse even achieves a 2× computation reduction when compared to 3 concurrently-executing simpler accumulation units with the same aggregate FPGA design area.
While the results in this paper reflect the opportunities of a specific signal processing problem, this work highlights the concept of exploiting computation reuse derived from a higher-level abstract representation at a mathematical and hardware level. As such, we believe this approach can also be leveraged in other signal recognition problems with specific well-characterized computational structures and signal dictionaries.

References

[1]
R. Coifman and M. Wickerhauser. Entropy-based algorithms for best basis selection. IEEE Trans. on Information Theory, 38(2):713--718, 1992.
[2]
P. d'Alberto, P. Milder, A. Sandryhaila, F. Franchetti, J. Hoe, J. Moura, M. P¨uschel, and J. Johnson. Generating FPGA accelerated DFT libraries. In IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM'07), pages 173--184, 2007.
[3]
M. deLorimier and A. DeHon. Floating-Point Sparse Matrix-Vector Multiply for FPGAs. In Proc. of the Intl. Symp. on Field-Programmable Gate Arrays (FPGA'05), February 2005.
[4]
M. Demertzi, P. Diniz, M. Hall, A. Gilbert, and Y. Wang. A combined hardware/software optimization framework for signal representation and recognition. In Proc. of the 2007 Data-Driven Dynamic Application Systems (DDDAS) Workshop, 2007.
[5]
Y. Dou, S. Vassiliadis, G. Kuzmanov, and G. Gaydadjiev. 64-bit floating-point FPGA Matrix Multiplication. In Proc. of the 2005 ACM/SIGDA 13th Intl. Symp. on Field-programmable gate arrays (FPGA'05), pages 86--95, New York, NY, USA, 2005. ACM Press.
[6]
D. Ellis. Recoded digits archive at columbia university. M. Frigo. A fast Fourier transform compiler. In Proc. of the Conference on Programming Language Design and Implementation, May 1999.
[7]
O. C. D. G. A. N. M. Peardon. High performance scientific computing using FPGAs with IEEE floating point and logarithmic arithmetic for lattice QCD. In Proc. of the Intl. Conf. on Field Programmable Logic and Applications (FPL'06), pages 1--6, August 2006.
[8]
M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. Spiral: Code generation for dsp transforms. Proc. of the IEEE special issue on Program Generation, Optimization, and Adaptation, 93(2):232--275, 2005.
[9]
N. Saito and R. Coifman. Local discriminant bases. Mathematical Imaging: Wavelet Applications in Signal and Image Processing, Proc. SPIE, 2303, July 1994.
[10]
L. Zhuo and V. K. Prasanna. Sparse matrix-vector multiplication on fpgas. In Proc. of the 2005 ACM/SIGDA 13th Intl. Symp. on Field-Programmable Gate Arrays, pages 63--74, New York, NY, USA, 2005. ACM.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
February 2009
302 pages
ISBN:9781605584102
DOI:10.1145/1508128
  • General Chair:
  • Paul Chow,
  • Program Chair:
  • Peter Cheung

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computation reuse
  2. fpga
  3. signal recognition

Qualifiers

  • Poster

Conference

FPGA '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media