Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/341800.341821acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article
Free access

Multithreaded algorithms for the fast Fourier transform

Published: 09 July 2000 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. The FFT problem has been formulated using two distinct approaches based on the dataflow concepts. The first approach, referred to as the receiver-initiated algorithm, realizes the FFAT iterations as a parent-child relationship while fully exploiting the underlying parallelism. The second approach, referred to as the sender-initiated algorithm, follows a data-flow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. The implementations of the proposed algorithms have been carried out on the EARTH (Efficient Architecture for Running THreads) platform. For both the algorithms, we analyze the ratio of remote vs local threads and study its impact on the experimental results. Our implementation results show that for certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach. For large number of processors, both the algorithms perform well, yielding execution times of only 10 msec for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed.

    References

    [1]
    Angelopoulos G. and Pitas I. Parallel implementation of 2-d fit algorithms on a hypercube. In Proc. Parallel Computing Action, Workshop ISPRA, Dec. 1990.
    [2]
    Angelopoulos G., Ligdas P. and Pitas I. Two-dimensional fit algorithms on parallel machines. In Transputing for Numerical and Neural Network Application, G.I. Reijns, editor, IOS Press, 1992.
    [3]
    Cho-Chin Lin, V.K. Prasanna, and A.A Khokhar. Scalable parallel extraction of linear features on mp-2. In Workshop on Computer Architectures for Machine Perception, pages 352-361, New Orleans, Louisiana, 1993. IEEE Computer Society Press.
    [4]
    Cochran W.T and Cooley J.W et.al. What is the fast Fourier transform? IEEE Transactions on Audio and Electroacoustics, 15:45-55, 1967.
    [5]
    Cooley J.W. and Lewis P.A. and Welch P.D. The Fast Fourier transform and its application to time series analysis. Wiley, New York, 1977. In statistical Methods for Digital Computers.
    [6]
    Frigo M. and Steven. Fftw. In http:////theory.lcs.mit.edu// fftw, 1999.
    [7]
    Gentleman W.M and Sande G. Fast Fourier transforms for fun and profit. In Proc. 1966 Fall Joint Computer Conference AFIPS 29, pages 563-578, 1966.
    [8]
    Hennesey J.L. and Patterson D.A. Computer Architecture: A quantitative Approach, Second Edition. Morgan Kaufmann,Inc., San Francisco,CA, 1996.
    [9]
    Hum H.H.J. et. al. A study of the earth-manna multithreaded system. In Intl. J. of Parallel Programming, volume 24(4), pages 319-347, Aug. 1996.
    [10]
    Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill,Inc., New York,NY, 1993.
    [11]
    Jamieson L.H, Delp E.J et.al. A library based program development environment for parallel image processing. In Scalable Parallel Library Conference, pages 187-194, Mississippi State University, Mississippi, 1993.
    [12]
    Kamin R.A. and Adams G.B. Fast fourier transform algorithm design and tradeoffs on the cm-2. In Proc. Workshop Comput. Arch. Pat. Anal. Mach. Intell., pages 184-191, Oct. 1987.
    [13]
    Kumar V. and Grama A. et. al. Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings Publishing Company, 1994.
    [14]
    Leighton F.T. Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, San Mateo, California, 1992.
    [15]
    Leiserson C. Cilk. In http://supertech.lcs.mit.edu/cilk, 1999.
    [16]
    Loan C.L. Computational frameworks for the fast fourier transform. SIAM Journal, Frontiers in Applied Mathematics, 1992.
    [17]
    Maquelin O. et. al. Costs and benefits of multithreading with off-the-shelf risc processors. In Proc. of the First Intl. EURO-PAR Conf., pages 117-128, Stockholm, Sweden, Aug. 1995. Springer-Verlag.
    [18]
    Oppenheim A.V. and Willsky A.S. Signals and Systems. Prentice Hall, Englewood Cliffs, New Jersey, 1983.
    [19]
    Pease M.C. An adaptation of the fast Fourier transform for parallel processing. Journal of the A CM, 15:252-264, 1968.
    [20]
    Pitas I. Parallel Algorithms for Digital Image Processing, Computer Vision and Neural Networks. John Wiley and Sons, New York, NY, 1993.
    [21]
    Prasanna V.K, Cho-Li Wang and Khokhar A.A. Low level vision processing on connection machine cm-5. In Workshop on Computer Architectures for Machine Perception, pages 117-126, New Orleans, Louisiana, 1993. IEEE Computer Society Press.
    [22]
    Sohn A., Kodama Y., et.al. Fine-Grain Multithreading with the EM-X. In Ninth A CM Symposium on Parallel Algorithms and Architectures, pages 189-198, Newport, Rhode Island, June 1997.
    [23]
    Stone H.S. Parallel processing with the perfect shuffle. In IEEE Trans. Computers, C-20, pages 153-161, 1971.
    [24]
    Kevin Bryan Theobald. EARTH: An Efficient Architecture for Running Threads. PhD thesis, McGill, Montreal, May 1999.
    [25]
    Thompson C.D. Fourier transforms in VLSI. IEEE Transactions on Computers, 32:1047-1057, 1983.

    Cited By

    View all
    • (2013)Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution ModelsProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.47(1607-1617)Online publication date: 20-May-2013
    • (2008)Exploiting Data Locality in FFT Using Indirect Swap Network on Cell/B.E.Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications10.1109/HPCS.2008.11(88-94)Online publication date: 9-Jun-2008
    • (2007)Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study2007 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2007.370641(1-8)Online publication date: Mar-2007
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
    July 2000
    224 pages
    ISBN:1581131852
    DOI:10.1145/341800
    • Chairmen:
    • Gary Miller,
    • Shang-Hua Teng
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 July 2000

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dataflow architecture
    2. fine-grained
    3. multithreading
    4. non-preemptive
    5. parallel algorithms

    Qualifiers

    • Article

    Conference

    SPAA00

    Acceptance Rates

    SPAA '00 Paper Acceptance Rate 24 of 45 submissions, 53%;
    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)101
    • Downloads (Last 6 weeks)9

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution ModelsProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.47(1607-1617)Online publication date: 20-May-2013
    • (2008)Exploiting Data Locality in FFT Using Indirect Swap Network on Cell/B.E.Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications10.1109/HPCS.2008.11(88-94)Online publication date: 9-Jun-2008
    • (2007)Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study2007 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2007.370641(1-8)Online publication date: Mar-2007
    • (2007)Optimizing the Fast Fourier Transform on a Multi-core Architecture2007 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2007.370639(1-8)Online publication date: Mar-2007
    • (2007)Performance portability on EARTH: a case study across several parallel architecturesCluster Computing10.1007/s10586-007-0011-110:2(115-126)Online publication date: 1-Jun-2007
    • (2006)Hybrid MPI/Pthread Implementation of 1-D FFT on SMP2006 Proceeding of the Thrity-Eighth Southeastern Symposium on System Theory10.1109/SSST.2006.1619106(367-370)Online publication date: 2006
    • (2004)Improving data locality in parallel fast fourier transform algorithm for pricing financial derivatives18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.10.1109/IPDPS.2004.1303283(235-240)Online publication date: 2004
    • (2003)A distributed implementation of fast Fourier transform on indirect swap networksCCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436)10.1109/CCECE.2003.1226100(1147-1150)Online publication date: 2003
    • (2003)Performance Evaluation of a Multithreaded Fast Fourier Transform Algorithm for Derivative PricingThe Journal of Supercomputing10.1023/A:102446400127326:1(43-58)Online publication date: 1-Aug-2003

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media