Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Parallelizing Data Processing on FPGAs with Shifter Lists

Published: 31 March 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Parallelism is currently seen as a mechanism to minimize the impact of the power and heat dissipation problems encountered in modern hardware. Data parallelism—based on partitioning the data—and pipeline parallelism—based on partitioning the computation—are the two main approaches to leverage parallelism on a wide range of hardware platforms.
    Unfortunately, not all data processing problems are susceptible to either of those strategies. An example is the skyline operator [Börzsönyi et al. 2001], which computes the set of Pareto-optimal points within a multidimensional dataset. Existing approaches to parallelize the skyline operator are based on data parallelism. As a result, they suffer from a high overhead when merging intermediate results because of the lack of a global view of the problem inherent to partitioning the input data.
    In this article, we show how to combine pipeline with data parallelism on a Field-Programmable Gate Array (FPGA) for a more efficient utilization of the available hardware parallelism. As we show in our experiments, skyline computation using our proposed technique scales linearly with the number of processing elements, and the performance we achieve on a rather small FPGA is comparable to that of a 64-core high-end server running a state-of-the-art data parallel implementation of skyline [Park et al. 2009].
    The proposed approach to parallelize the skyline operator can be generalized to a wider range of data processing problems. We demonstrate this through a novel, highly parallel data structure, a shifter list, that can be efficiently implemented on an FPGA. The resulting template is easy to parametrize to implement a variety of computationally intensive operators such as frequent items, n-closest pairs, or K-means.

    References

    [1]
    Ray Bittner. 2009. The speedy DDR2 controller for FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA’09).
    [2]
    Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5 (May 2011).
    [3]
    Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering (ICDE’01).
    [4]
    Sung-Ryoung Cho, Jongwuk Lee, Seung-Won Hwang, Hwansoo Han, and Sang-Won Lee. 2010. Vskyline: Vectorization for efficient skyline computation. SIGMOD Rec. 39, 2 (Dec. 2010).
    [5]
    Eric S. Chung, James C. Hoe, and Ken Mai. 2011. Coram: An in-fabric memory architecture for FPGA-based computing. In Proceedings of the 19th ACM SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11).
    [6]
    Convey Computer. 2014. Convey HC-2. Retrieved from http://www.conveycomputer.com.
    [7]
    Christopher Dennl, Daniel Ziener, and Jürgen Teich. 2012. On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library. In Proceedings of the 20th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’12).
    [8]
    Petros Drineas, Alan M. Frieze, Ravi Kannan, Santosh S. Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 1--3 (June 2004).
    [9]
    Ken Eguro. 2010. SIRC: An extensible reconfigurable computing communication API. In Proceedings of the 18th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’10).
    [10]
    Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Symposium on Computer Architecture (ISCA’11).
    [11]
    Parke Godfrey, Ryan Shipley, and Jarek Gryz. 2005. Maximal vector computation in large data sets. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB’05).
    [12]
    Amir Hormati, Manjunath Kudlur, Scott Mahlke, David Bacon, and Rodric Rabbah. 2008. Optimus: Efficient realization of streaming applications on FPGAs. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’08).
    [13]
    IBM. 2014. IBM Netezza Data Warehouse Appliances. Retrieved from http://www.ibm.com/software/data/netezza.
    [14]
    Hiroaki Inoue, Takashi Takenaka, and Masato Motomura. 2011. 20Gbps C-based complex event processing. In Proceedings of the 21st International. Conference on Field Programmable Logic and Applications (FPL’11).
    [15]
    Gilles Kahn. 1974. The semantics of simple language for parallel programming. In IFIP Congress.
    [16]
    Dirk Koch and Jim Torresen. 2011. FPGASort: A high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the 19th ACM SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11).
    [17]
    Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2006. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. (TODS) 31, 3 (Sept. 2006).
    [18]
    Roger Moussalli, Mariam Salloum, Walid A. Najjar, and Vassilis J. Tsotras. 2011. Massively parallel XML twig filtering using dynamic programming on FPGAs. In Proceedings of the 27th International Conference on Data Engineering (ICDE’11).
    [19]
    Sungwoo Park, Taekyung Kim, Jonghyun Park, Jinha Kim, and Hyeonseung Im. 2009. Parallel skyline computation on multicore architectures. In Proceedings of the 25th International Conference on Data Engineering (ICDE’09).
    [20]
    Parthasarathy Ranganathan. 2011. From microprocessors to nanostores: Rethinking data-centric systems. IEEE Comput. 44, 1 (Jan. 2011).
    [21]
    Satnam Singh. 2011. Computing without processors. Commun. ACM 54, 8 (Aug. 2011).
    [22]
    Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12).
    [23]
    Jens Teubner, René Müller, and Gustavo Alonso. 2010. FPGA acceleration for the frequent item problem. In Proceedings of the 26th International Conference on Data Engineering (ICDE’10).
    [24]
    Riccardo Torlone and Paolo Ciaccia. 2002. Which are my preferred items. In Proceedings of the Workshop on Recommendation and Personalization in eCommerce (RPEC’02).

    Cited By

    View all
    • (2022)Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and OptimizationsACM Transactions on Reconfigurable Technology and Systems10.1145/354765816:1(1-32)Online publication date: 22-Dec-2022
    • (2020)Hardware-Software XML-Documents ProcessingÈlektronnoe modelirovanie10.15407/emodel.42.01.03342:1(33-50)Online publication date: 5-Feb-2020
    • (2018)BJR-tree: fast skyline computation algorithm using dominance relation-based tree structureInternational Journal of Data Science and Analytics10.1007/s41060-018-0098-x7:1(17-34)Online publication date: 31-Jan-2018
    • Show More Cited By

    Index Terms

    1. Parallelizing Data Processing on FPGAs with Shifter Lists

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 8, Issue 2
      Special Section on FPL 2013
      April 2015
      129 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/2746532
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 March 2015
      Accepted: 01 April 2014
      Revised: 01 February 2014
      Received: 01 September 2013
      Published in TRETS Volume 8, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. K-means
      2. n-closest pairs
      3. FPGA
      4. database
      5. frequent items
      6. parallelism
      7. pipeline
      8. shifter list
      9. skyline query

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis.”
      • Enterprise Computing Center (ECC) at ETH Zürich (http://www.ecc.ethz.ch)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and OptimizationsACM Transactions on Reconfigurable Technology and Systems10.1145/354765816:1(1-32)Online publication date: 22-Dec-2022
      • (2020)Hardware-Software XML-Documents ProcessingÈlektronnoe modelirovanie10.15407/emodel.42.01.03342:1(33-50)Online publication date: 5-Feb-2020
      • (2018)BJR-tree: fast skyline computation algorithm using dominance relation-based tree structureInternational Journal of Data Science and Analytics10.1007/s41060-018-0098-x7:1(17-34)Online publication date: 31-Jan-2018
      • (2018)Skyline Computation for Big DataData Science and Big Data Analytics10.1007/978-981-10-7641-1_23(267-276)Online publication date: 2-Aug-2018
      • (2017)CaribouProceedings of the VLDB Endowment10.14778/3137628.313763210:11(1202-1213)Online publication date: 1-Aug-2017
      • (2017)Pipelining a triggered processing elementProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3124551(96-108)Online publication date: 14-Oct-2017
      • (2017)Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAsACM Transactions on Embedded Computing Systems10.1145/309295016:4(1-26)Online publication date: 13-Jul-2017
      • (2017)BJR-Tree: Fast Skyline Computation Algorithm for Serendipitous Searching Problems2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2017.15(272-282)Online publication date: Oct-2017
      • (2015)MEMOCODE 2015 design contestProceedings of the 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign10.1109/MEMCOD.2015.7340467(48-51)Online publication date: 1-Sep-2015

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media