Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/795664.796469guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Towards an Optimal Bit-Reversal Permutation Program

Published: 08 November 1998 Publication History

Abstract

The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hierarchies of modern computers. Array transpose and the bit-reversal permutation -- trivial operations on a RAM -- present non-trivial problems when designing highly-tuned scientific library functions, particular for the Fast Fourier Transform. We prove a precise bound for RoCol, a simple pebble-type game that is relevant to implementing these permutations. We use RoCol to give lower bounds on the amount of memory traffic in a computer with four-levels of memory (registers, cache, TLB, and memory), taking into account such ``messy'' features as block moves and set-associative caches. The insights from this analysis lead to a bit-reversal algorithm whose performance is close to the theoretical minimum. Experiments show it performs significantly better than every program in a comprehensive study of 30 published algorithms.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
FOCS '98: Proceedings of the 39th Annual Symposium on Foundations of Computer Science
November 1998
ISBN:0818691727

Publisher

IEEE Computer Society

United States

Publication History

Published: 08 November 1998

Author Tags

  1. Cache
  2. FFT
  3. Memory hierarchy
  4. Permutations
  5. Transpose

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Cache-Oblivious AlgorithmsACM Transactions on Algorithms10.1145/2071379.20713838:1(1-22)Online publication date: 1-Jan-2012
  • (2008)Algorithms and data structures for external memoryFoundations and Trends® in Theoretical Computer Science10.1561/04000000142:4(305-474)Online publication date: 1-Jan-2008
  • (2008)On the limits of cache-oblivious rational permutationsTheoretical Computer Science10.1016/j.tcs.2008.04.036402:2-3(221-233)Online publication date: 20-Jul-2008
  • (2007)Optimal bit-reversal using vector permutationsProceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures10.1145/1248377.1248411(198-199)Online publication date: 9-Jun-2007
  • (2006)Combining analytical and empirical approaches in tuning matrix transpositionProceedings of the 15th international conference on Parallel architectures and compilation techniques10.1145/1152154.1152190(233-242)Online publication date: 16-Sep-2006
  • (2003)Online paging with arbitrary associativityProceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms10.5555/644108.644202(555-564)Online publication date: 12-Jan-2003
  • (2002)External memory algorithmsHandbook of massive data sets10.5555/779232.779243(359-416)Online publication date: 1-Jan-2002
  • (2001)External memory algorithms and data structuresACM Computing Surveys10.1145/384192.38419333:2(209-271)Online publication date: 1-Jun-2001
  • (2000)Towards a theory of cache-efficient algorithmsProceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms10.5555/338219.338646(829-838)Online publication date: 1-Feb-2000
  • (1999)Cache-Oblivious AlgorithmsProceedings of the 40th Annual Symposium on Foundations of Computer Science10.5555/795665.796479Online publication date: 17-Oct-1999

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media