research-article

ABench: Big Data Architecture Stack Benchmark

Authors:

Todor Ivanov,

Rekha SinghalAuthors Info & Claims

ICPE '18: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering

Pages 13 - 16

https://doi.org/10.1145/3185768.3186300

Published: 02 April 2018 Publication History

Get Access

Abstract

Distributed big data processing and analytics applications demand a comprehensive end-to-end architecture stack consisting of big data technologies. However, there are many possible architecture patterns (e.g. Lambda, Kappa or Pipeline architectures) to choose from when implementing the application requirements. A big data technology in isolation may be best performing for a particular application, but its performance in connection with other technologies depends on the connectors and the environment. Similarly, existing big data benchmarks evaluate the performance of different technologies in isolation, but no work has been done on benchmarking big data architecture stacks as a whole. For example, BigBench (TPCx-BB) may be used to evaluate the performance of Spark, but is it applicable to PySpark or to Spark with Kafka stack as well? What is the impact of having different programming environments and/or any other technology like Spark? This vision paper proposes a new category of benchmark, called ABench, to fill this gap and discusses key aspects necessary for the performance evaluation of different big data architecture stacks.

References

[1]

Bjørn Andersen and P-G Pettersen. 1995. Benchmarking handbook. Champman & Hall.

Google Scholar

[2]

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10--11, 2010. 143--154.

Digital Library

Google Scholar

[3]

Gartner. 2017. Planning Guide for Data and Analytics. www.gartner.com/doc/3471553/-planning-guide-data-analytics. (2017).

Google Scholar

[4]

Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In ICDE 2017, San Diego, CA, USA, April 19--22.

Google Scholar

[5]

Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013. 1197--1208.

Digital Library

Google Scholar

[6]

Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The Hi-Bench benchmark suite: Characterization of the MapReduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 41--51.

Crossref

Google Scholar

[7]

Todor Ivanov, Sead Izberovic, and Nikolaos Korfiatis. 2016. The Heterogeneity Paradigm in Big Data Architectures. In Managing and Processing Big Data in Cloud Computing. IGI Global, 218--245.

Google Scholar

[8]

Sankaralingam Panneerselvam and Michael Swift. 2016. Rinnegan: Efficient Resource Use in Heterogeneous Architectures (PACT 2016). ACM, New York, USA, 373--386.

Digital Library

Google Scholar

[9]

Sweta Singh. 2016. Benchmarking Spark Machine Learning Using BigBench. In 8th TPC Technology Conference, TPCTC 2016, New Delhi, India, September 5--9, 2016.

Google Scholar

[10]

Rekha Singhal and Praveen Singh. 2017. Performance Assurance Model for Applications on Spark Platform. In 9th TPC Technology Conference 2017.

Google Scholar

[11]

Rekha Singhal and Abhishek Verma. 2016. Predicting Job Completion Time in Heterogeneous MapReduce Environments. In IPDPS Work. 2016, Chicago, USA, May 23--27.

Crossref

Google Scholar

[12]

TPC. 2018. www.tpc.org/. (2018).

Google Scholar

[13]

Dongyao Wu, Liming Zhu, Xiwei Xu, Sherif Sakr, Daniel Sun, and Qinghua Lu. 2016. Building Pipelines for Heterogeneous Execution Environments for Big Data Processing. IEEE Softw. (2016), 8.

Digital Library

Google Scholar

Cited By

View all

Aluko VSakr S(2022)Big SQL systems: an experimental evaluationCluster Computing10.1007/s10586-019-02914-422:4(1347-1377)Online publication date: 11-Mar-2022
https://dl.acm.org/doi/10.1007/s10586-019-02914-4
Mourlin FDumont CNel L(2022)Efficient Big Data Architecture Based on Micro ServiceAdvances in Computational Intelligence and Communication10.1007/978-3-031-19523-5_5(63-77)Online publication date: 14-Dec-2022
https://doi.org/10.1007/978-3-031-19523-5_5
Ivanov TSinghal RAmaral JKoziolek ATrubiani CIosup A(2020)Tutorial on Benchmarking Big Data Analytics SystemsCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3375555.3383121(50-53)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3375555.3383121
Show More Cited By

Index Terms

ABench: Big Data Architecture Stack Benchmark

Recommendations

BigBench: towards an industry standard benchmark for big data analytics
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

There is a tremendous interest in big data by academia, industry and a large user base. Several commercial and open source providers unleashed a variety of products to support big data storage and processing. As these products mature, there is a need to ...
CoreBigBench: Benchmarking big data core operations
DBTest '20: Proceedings of the workshop on Testing Database Systems

Significant effort was put into big data benchmarking with focus on end-to-end applications. While covering basic functionalities implicitly, the details of the individual contributions to the overall performance are hidden. As a result, end-to-end ...
Adding Velocity to BigBench
DBTest '18: Proceedings of the Workshop on Testing Database Systems

BigBench standardized as TPCx-BB is a popular application benchmark that targets Big Data storage and processing systems. BigBench V2 addresses some of the BigBench limitations by introducing a new simplified data model, semi-structured web logs in JSON ...

Comments

Information & Contributors

Information

Published In

ICPE '18: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering

April 2018

212 pages

ISBN:9781450356299

DOI:10.1145/3185768

General Chairs:
Katinka Wolter
Free University of Berlin, Germany
,
Will Knottenbelt
Imperial College London, UK
,
Program Chairs:
André van Hoorn
University of Stuttgart, Germany
,
Manoj Nambiar
Tata Consultancy Services, India
,
Heiko Koziolek
ABB, Germany

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICPE '18

Sponsor:

ICPE '18: ACM/SPEC International Conference on Performance Engineering

April 9 - 13, 2018

Berlin, Germany

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
482
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Aluko VSakr S(2022)Big SQL systems: an experimental evaluationCluster Computing10.1007/s10586-019-02914-422:4(1347-1377)Online publication date: 11-Mar-2022
https://dl.acm.org/doi/10.1007/s10586-019-02914-4
Mourlin FDumont CNel L(2022)Efficient Big Data Architecture Based on Micro ServiceAdvances in Computational Intelligence and Communication10.1007/978-3-031-19523-5_5(63-77)Online publication date: 14-Dec-2022
https://doi.org/10.1007/978-3-031-19523-5_5
Ivanov TSinghal RAmaral JKoziolek ATrubiani CIosup A(2020)Tutorial on Benchmarking Big Data Analytics SystemsCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3375555.3383121(50-53)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3375555.3383121
Ghane K(2020)Big Data Pipeline with ML-Based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data2020 3rd International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT50521.2020.00018(60-67)Online publication date: Mar-2020
https://doi.org/10.1109/ICICT50521.2020.00018
Diván MSánchez Reynoso M(2019)An Architecture for the Real-Time Data Stream Monitoring in IoTMultimedia Big Data Computing for IoT Applications10.1007/978-981-13-8759-3_3(59-100)Online publication date: 18-Jul-2019
https://doi.org/10.1007/978-981-13-8759-3_3
Ivanov TPergolesi M(2019)The impact of columnar file formats on SQL‐on‐hadoop engine performance: A study on ORC and ParquetConcurrency and Computation: Practice and Experience10.1002/cpe.552332:5Online publication date: 9-Sep-2019
https://doi.org/10.1002/cpe.5523

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

BigBench: towards an industry standard benchmark for big data analytics

CoreBigBench: Benchmarking big data core operations

Adding Velocity to BigBench