Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Streaming Sorting Networks

Published: 27 May 2016 Publication History

Abstract

Sorting is a fundamental problem in computer science and has been studied extensively. Thus, a large variety of sorting methods exist for both software and hardware implementations. For the latter, there is a trade-off between the throughput achieved and the cost (i.e., the logic and storage invested to sort n elements). Two popular solutions are bitonic sorting networks with O(nlog 2n) logic and storage, which sort n elements per cycle, and linear sorters with O(n) logic and storage, which sort n elements per n cycles. In this article, we present new hardware structures that we call streaming sorting networks, which we derive through a mathematical formalism that we introduce, and an accompanying domain-specific hardware generator that translates our formal mathematical description into synthesizable RTL Verilog. With the new networks, we achieve novel and improved cost-performance trade-offs. For example, assuming that n is a two-power and w is any divisor of n, one class of these networks can sort in n/;w cycles with O(wlog 2n) logic and O(nlog 2n) storage; the other class that we present sorts in nlog 2n/;w cycles with O(w) logic and O(n) storage. We carefully analyze the performance of these networks and their cost at three levels of abstraction: (1) asymptotically, (2) exactly in terms of the number of basic elements needed, and (3) in terms of the resources required by the actual circuit when mapped to a field-programmable gate array. The accompanying hardware generator allows us to explore the entire design space, identify the Pareto-optimal solutions, and show superior cost-performance trade-offs compared to prior work.

References

[1]
M. Ajtai, J. Komlós, and E. Szemerédi. 1983. An O(N log N) sorting network. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC’83). ACM, New York, NY, 1--9.
[2]
K. E. Batcher. 1968. Sorting networks and their applications. In Proceedings of the April 30--May 2, 1968, Spring Joint Computer Conference (AFIPS’68 (Spring)). ACM, New York, NY, 307--314.
[3]
Gianfranco Bilardi and Franco P. Preparata. 1984. An architecture for bitonic sorting with optimal VLSI performance. IEEE Transactions on Computers 100, 7, 646--651.
[4]
Ren Chen, Sruja Siriyal, and Viktor Prasanna. 2015. Energy and memory efficient mapping of bitonic sorting on FPGA. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). ACM, New York, NY, 240--249.
[5]
Yen-Cheng Chen and Wen-Tsuen Chen. 1994. Constant time sorting on reconfigurable meshes. IEEE Transactions on Computers 43, 6, 749--751.
[6]
Martin Dowd, Yehoshua Perl, Larry Rudolph, and Michael Saks. 1989. The periodic balanced sorting network. Journal of the ACM 36, 4, 738--757.
[7]
Franz Franchetti, Frédéric de Mesmay, Daniel McFarlin, and Markus Püschel. 2009. Operator language: A program generation framework for fast kernels. In Domain-Specific Languages. Lecture Notes in Computer Science, Vol. 5658. Springer, 385--410.
[8]
Ju-Wook Jang and Viktor K. Prasanna. 1992. An optimal sorting algorithm on reconfigurable mesh. In Proceedings of the Parallel Processing Symposium. IEEE, Los Alamitos, CA, 130--137.
[9]
J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri. 1990. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures. Circuits, Systems, and Signal Processing 9, 4, 449--500.
[10]
Donald E. Knuth. 1968. The Art of Computer Programming: Sorting and Searching. Addison-Wesley.
[11]
Miroslaw Kutylowsky, Krzysztof Loryś, Brigitte Oesterdiekhoff, and Rolf Wanka. 2000. Periodification scheme: Constructing sorting networks with constant period. Journal of the ACM 47, 5, 944--967.
[12]
Christophe Layer and Hans-Jörg Pfleiderer. 2004. A reconfigurable recurrent bitonic sorting network for concurrently accessible data. In Field Programmable Logic and Application. Lecture Notes in Computer Science, Vol. 3203. Springer, 648--657.
[13]
Christophe Layer, Daniel Schaupp, and Hans-Jörg Pfleiderer. 2007. Area and throughput aware comparator networks optimization for parallel data processing on FPGA. In Proceedings of the International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA, 405--408.
[14]
Chen-Yi Lee and Jer-Min Tsai. 1995. A shift register architecture for high-speed data sorting. Journal of VLSI Signal Processing Systems 11, 3, 273--280.
[15]
F. T. Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Vol. 1. Morgan Kaufmann, San Mateo, CA.
[16]
Chi-Sheng Lin and Bin-Da Liu. 2002. Design of a pipelined and expandable sorting architecture with simple control scheme. In Proceedings of the International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA, IV-217--IV-220.
[17]
Peter Milder, Franz Franchetti, James C. Hoe, and Markus Püschel. 2012. Computer generation of hardware for linear digital signal processing transforms. ACM Transactions on Design Automation of Electronic Systems 17, 2, Article No. 15.
[18]
Rene Mueller, Jens Teubner, and Gustavo Alonso. 2012. Sorting networks on FPGAs. VLDB Journal 21, 1, 1--23.
[19]
Jorge Ortiz and David Andrews. 2010. A configurable high-throughput linear sorter system. In Proceedings of the Reconfigurable Architectures Workshop at the International Symposium on Parallel and Distributed Systems. IEEE, Los Alamitos, CA.
[20]
Marshall C. Pease. 1968. An adaptation of the fast fourier transform for parallel processing. Journal of the ACM 15, 2, 252--264.
[21]
Roberto Perez-Andrade, Rene Cumplido, Claudia Feregrino-Uribe, and Fernando Martin Del Campo. 2009. A versatile linear insertion sorter based on an FIFO scheme. Microelectronics Journal 40, 12, 1705--1713.
[22]
Markus Püschel, Peter A. Milder, and James C. Hoe. 2009. Permuting streaming data using RAMs. Journal of the ACM 56, 2, Article No. 10.
[23]
Markus Püschel, Peter A. Milder, and James C. Hoe. 2012. System and method for designing architecture for specified permutation and datapath circuits for permutation. U.S. Patent No. 8,321,823.
[24]
Markus Püschel, José M. F. Moura, Jeremy R. Johnson, David Padua, Manuela M. Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gačić, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. 2005. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE 93, 2, 232--275.
[25]
Isaac D. Scherson and Sandeep Sen. 1989. Parallel sorting in two-dimensional VLSI models of computation. IEEE Transactions on Computers 38, 2, 238--249.
[26]
H. S. Stone. 1971. Parallel processing with the perfect shuffle. IEEE Transactions on Computers 20, 2, 153--161.
[27]
Charles Van Loan. 1992. Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, PA.
[28]
Y. Zhang and S. Q. Zheng. 2000. An efficient parallel VLSI sorting architecture. VLSI Design 11, 2, 137--147.
[29]
Marcela Zuluaga, Peter Milder, and Markus Püschel. 2012a. Computer generation of streaming sorting networks. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 1245--1253.
[30]
Marcela Zuluaga, Peter Milder, and Markus Püschel. 2012b. Sorting Network IP Generator. Retrieved April 6, 2016, from http://www.spiral.net/hardware/sort/sort.html.

Cited By

View all
  • (2024)Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise ProductACM Transactions on Architecture and Code Optimization10.1145/368861221:4(1-25)Online publication date: 20-Nov-2024
  • (2024)Compact FPGA-Based Data Acquisition System for a High-Channel, High-Count-Rate TOF-PET Insert for Brain PET/MRIIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.332809173(1-9)Online publication date: 2024
  • (2024)DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming SortingIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337725571:5(2549-2553)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 21, Issue 4
September 2016
423 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2939671
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 27 May 2016
Accepted: 01 November 2015
Received: 01 October 2015
Published in TODAES Volume 21, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HDL generation
  2. Hardware sorting
  3. design space exploration

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)6
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise ProductACM Transactions on Architecture and Code Optimization10.1145/368861221:4(1-25)Online publication date: 20-Nov-2024
  • (2024)Compact FPGA-Based Data Acquisition System for a High-Channel, High-Count-Rate TOF-PET Insert for Brain PET/MRIIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.332809173(1-9)Online publication date: 2024
  • (2024)DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming SortingIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337725571:5(2549-2553)Online publication date: May-2024
  • (2024)A Low-Cost Pipelined Architecture Based on a Hybrid Sorting AlgorithmIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.334292971:2(717-730)Online publication date: Feb-2024
  • (2024)PRIMATE: Processing in Memory Acceleration for Dynamic Token-pruning Transformers2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473968(557-563)Online publication date: 22-Jan-2024
  • (2023)A Hardware Design Generator of High-Performance FIFO-Based Linear Insertion Streaming Sorters2023 30th International Conference on Mixed Design of Integrated Circuits and System (MIXDES)10.23919/MIXDES58562.2023.10203246(79-82)Online publication date: 29-Jun-2023
  • (2023)Supply Chain Aware Computer ArchitectureProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589052(1-15)Online publication date: 17-Jun-2023
  • (2023)Sky-Sorter: A Processing-in-Memory Architecture for Large-Scale SortingIEEE Transactions on Computers10.1109/TC.2022.316943472:2(480-493)Online publication date: 1-Feb-2023
  • (2023)Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00028(201-213)Online publication date: Apr-2023
  • (2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Feb-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media