Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1403375.1403521acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

Generic multi-phase software-pipelined Partial-FFT on instruction-level-parallel architectures and SDR baseband applications

Published: 10 March 2008 Publication History

Abstract

The PFFT (Partial FFT) is an extended FFT where only part of input or output bins are used. By pruning the useless dataflow, the PFFT can potentially achieve a significant speedup in many important applications. Although theoretical aspects of the PFFT have been thoroughly studied in past three decades, efficient implementations were rarely reported. The most important obstacle is the highly irregular dataflow and the associated control flow. In addition, a size-N PFFT has 2N dataflow possibilities, so that delivering both flexibility and efficiency in the same implementation is very challenging. This paper presents a generic scheme to map the highly irregular dataflow of arbitrary PFFT onto ILP architectures with highly efficient SWP (SoftWare-Pipelining). Constraints and opportunities of algorithms and architecture are carefully analyzed and exploited. We introduce a multi-phase partitioning, bringing heterogeneous control structures and heterogeneous software pipelining schemes to minimize control overheads and to maximize the efficiency of SWP. The proposal has been tested with 10 representative benchmarks extracted from baseband applications. In experiments cycle-counts, instructions, NOPs, L1D/L1P access/miss/hit are thoroughly analyzed. Comparing to full FFTs with efficient SWP, our work reduces 20.5% - 87.5% cycle-counts, 11.2% - 86.5% instructions, 16.1% - 79.4% L1D cache accesses and 19.5% - 87.1% L1P cache accesses. To the best of our knowledge, this is the first reported work about the generic software-pipelined PFFT on ILP architectures.

References

[1]
Z. Hu and H. Wan, "A novel generic fast Fourier transform pruning technique and complexity analysis," IEEE Trans. Signal Process., Jan. 2005 Volume: 53, page(s): 274--282.
[2]
Murphy, C. D. "Low-complexity FFT structures for OFDM transceivers", IEEE Trans. on Signal Process., Volume: 50, Issue: 12 On page(s): 1878--1881, Dec 2002.
[3]
3GPP LTE TR 25.814: Physical layer aspects for E-UTRA.
[4]
J. D. Markel, FFT pruning, IEEE Trans. Audio Electroacoust., vol. 19, pp. 305--311, Dec. 1971.
[5]
S. S. He and M. Torkelson, Computing partial DFT for comb spectrum evaluation, IEEE Signal Process. Lett., vol. 3, pp. 173--175, Jun. 1996.
[6]
Alves, R. G. Osorio, P. L. Swamy, M. N. S. General FFT pruning algorithm, the 43rd IEEE Midwest Symposium on Circuits and Systems, 2000
[7]
S. R. Rangarajan and S. Srinivasan, Generalized method for pruning an FFT type of transform, Proc. Inst. Elect. Eng. Vis. Image Signal, vol. 144, pp. 189--192, 1997.
[8]
H. V. Sorensen and C. S. Burrus, Efficient computation of the DFT with only a subset of input or output points, IEEE Trans. Signal Process., vol. 41, pp. 1184--1200, Mar. 1993.
[9]
K. Nagai, Pruning the decimation-in-time FFT Algorithm with frequency shift, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp. 1008--1010, Aug, 1986.
[10]
Shousheng He, Torkelson, M. VLSI computation of the partial DFT for (de)modulation inmulti-channel OFDM system, IEEE PIMRC 1995. Sep 1995
[11]
R. Rajbanshi, A. M. Wyglinski, and G. J. Minden, An Efficient Implementation of NCOFDM Transceivers for Cognitive Radios, IEEE CrownCom 2006.
[12]
Y. Lin, H. Lee, M. Woh, Y. Harel, S. Mahlke, T. Mudge, C. Chakrabarti and K. Flautner, SODA: A High-Performance DSP Architecture for Software-Defined Radio, IEEE Micro 2007
[13]
Ta. Kumura, M. Ikakawa et al, VLIW DSP for mobile Applications, IEEE Signal processing magazine July 2002
[14]
B. Mei, S. Vernalde, D. Verkest, R. Lauwereins, Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study, Proc. of DATE 2004, pp. 1224--1229,
[15]
V. Allan, R. Jones, R. Lee, and S. Allan. Software Pipelining. ACM Computing Surveys, 27(3), September 1995.
[16]
TMS320C64x DSP Library Programmer's Reference (Rev. B)
[17]
Yutai Ma, "An Effective Memory Addressing Scheme for FFT Processors," IEEE Trans. Signal Process., vol. 47, Issue 3, pp. 907--911, March 1999
[18]
Y. Qian, S. Carr and P. Sweany. "Loop Fusion for Clustered VLIW Architectures", In Proceedings of the ACM 2002 Joint Conference on Languages, Compilers and Tools for Embedded Systems and Software and Compilers for Embedded Systems, Berlin, Germany, June 19--21, 2002.
[19]
Z. Shao, C. Xue, Q. Zhuge, B. Xiao and E. H.-M. Sha, Loop Scheduling with Timing and Switching-Activity Minimization for VLIW DSP, ACM Transactions on Design Automation of Electronic Systems, vol. 11, no. 1, pp. 165--185, Jan. 2006.

Cited By

View all
  • (2009)Generating high performance pruned FFT implementationsProceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2009.4959642(549-552)Online publication date: 19-Apr-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DATE '08: Proceedings of the conference on Design, automation and test in Europe
March 2008
1575 pages
ISBN:9783981080131
DOI:10.1145/1403375
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 March 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DATE '08
Sponsor:
  • EDAA
  • SIGDA
  • The Russian Academy of Sciences
DATE '08: Design, Automation and Test in Europe
March 10 - 14, 2008
Munich, Germany

Acceptance Rates

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Upcoming Conference

DATE '25
Design, Automation and Test in Europe
March 31 - April 2, 2025
Lyon , France

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Generating high performance pruned FFT implementationsProceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2009.4959642(549-552)Online publication date: 19-Apr-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media