Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3185768.3186300acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

ABench: Big Data Architecture Stack Benchmark

Published: 02 April 2018 Publication History

Abstract

Distributed big data processing and analytics applications demand a comprehensive end-to-end architecture stack consisting of big data technologies. However, there are many possible architecture patterns (e.g. Lambda, Kappa or Pipeline architectures) to choose from when implementing the application requirements. A big data technology in isolation may be best performing for a particular application, but its performance in connection with other technologies depends on the connectors and the environment. Similarly, existing big data benchmarks evaluate the performance of different technologies in isolation, but no work has been done on benchmarking big data architecture stacks as a whole. For example, BigBench (TPCx-BB) may be used to evaluate the performance of Spark, but is it applicable to PySpark or to Spark with Kafka stack as well? What is the impact of having different programming environments and/or any other technology like Spark? This vision paper proposes a new category of benchmark, called ABench, to fill this gap and discusses key aspects necessary for the performance evaluation of different big data architecture stacks.

References

[1]
Bjørn Andersen and P-G Pettersen. 1995. Benchmarking handbook. Champman & Hall.
[2]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10--11, 2010. 143--154.
[3]
Gartner. 2017. Planning Guide for Data and Analytics. www.gartner.com/doc/3471553/-planning-guide-data-analytics. (2017).
[4]
Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In ICDE 2017, San Diego, CA, USA, April 19--22.
[5]
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013. 1197--1208.
[6]
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The Hi-Bench benchmark suite: Characterization of the MapReduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 41--51.
[7]
Todor Ivanov, Sead Izberovic, and Nikolaos Korfiatis. 2016. The Heterogeneity Paradigm in Big Data Architectures. In Managing and Processing Big Data in Cloud Computing. IGI Global, 218--245.
[8]
Sankaralingam Panneerselvam and Michael Swift. 2016. Rinnegan: Efficient Resource Use in Heterogeneous Architectures (PACT 2016). ACM, New York, USA, 373--386.
[9]
Sweta Singh. 2016. Benchmarking Spark Machine Learning Using BigBench. In 8th TPC Technology Conference, TPCTC 2016, New Delhi, India, September 5--9, 2016.
[10]
Rekha Singhal and Praveen Singh. 2017. Performance Assurance Model for Applications on Spark Platform. In 9th TPC Technology Conference 2017.
[11]
Rekha Singhal and Abhishek Verma. 2016. Predicting Job Completion Time in Heterogeneous MapReduce Environments. In IPDPS Work. 2016, Chicago, USA, May 23--27.
[12]
TPC. 2018. www.tpc.org/. (2018).
[13]
Dongyao Wu, Liming Zhu, Xiwei Xu, Sherif Sakr, Daniel Sun, and Qinghua Lu. 2016. Building Pipelines for Heterogeneous Execution Environments for Big Data Processing. IEEE Softw. (2016), 8.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '18: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering
April 2018
212 pages
ISBN:9781450356299
DOI:10.1145/3185768
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ABench
  2. big data
  3. big data benchmarking
  4. bigbench

Qualifiers

  • Research-article

Conference

ICPE '18

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Big SQL systems: an experimental evaluationCluster Computing10.1007/s10586-019-02914-422:4(1347-1377)Online publication date: 11-Mar-2022
  • (2022)Efficient Big Data Architecture Based on Micro ServiceAdvances in Computational Intelligence and Communication10.1007/978-3-031-19523-5_5(63-77)Online publication date: 14-Dec-2022
  • (2020)Tutorial on Benchmarking Big Data Analytics SystemsCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3375555.3383121(50-53)Online publication date: 20-Apr-2020
  • (2020)Big Data Pipeline with ML-Based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data2020 3rd International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT50521.2020.00018(60-67)Online publication date: Mar-2020
  • (2019)An Architecture for the Real-Time Data Stream Monitoring in IoTMultimedia Big Data Computing for IoT Applications10.1007/978-981-13-8759-3_3(59-100)Online publication date: 18-Jul-2019
  • (2019)The impact of columnar file formats on SQL‐on‐hadoop engine performance: A study on ORC and ParquetConcurrency and Computation: Practice and Experience10.1002/cpe.552332:5Online publication date: 9-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media