tutorial

Tutorial on Benchmarking Big Data Analytics Systems

Authors:

Rekha SinghalAuthors Info & Claims

ICPE '20: Companion of the ACM/SPEC International Conference on Performance Engineering

Pages 50 - 53

https://doi.org/10.1145/3375555.3383121

Published: 20 April 2020 Publication History

Abstract

The proliferation of big data technology and faster computing systems led to pervasions of AI based solutions in our life. There is need to understand how to benchmark systems used to build AI based solutions that have a complex pipeline of pre-processing, statistical analysis, machine learning and deep learning on data to build prediction models. Solution architects, engineers and researchers may use open-source technology or proprietary systems based on desired performance requirements. The performance metrics may be data pre-processing time, model training time and model inference time. We do not see a single benchmark answering all questions of solution architects and researchers. This tutorial covers both practical and research questions on relevant Big Data and Analytics benchmarks.

References

[1]

Faraz Ahmad. 2012. PUMA Benchmarks. https://engineering.purdue.edu/~puma/ pumabenchmarks.htm

[2]

AsterixDB. 2020. asterixdb.apache.org.

[3]

Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2 (2019), 423--443.

Digital Library

[4]

Jalil Boukhobza, Stéphane Rubini, Renhai Chen, and Zili Shao. 2018. Emerging NVM: A Survey on Architectural Integration and Research Challenges. ACM Trans. Design Autom. Electr. Syst. 23, 2 (2018), 14:1--14:32.

[5]

Yanpei Chen et al. 2012. We don't know enough to make a big data benchmark suite-an academia-industry view. Proc. of WBDB (2012).

[6]

Chaitanya K. Baru et al. 2014. Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data. In Proc. of the 6th TPCTC 2014, Hangzhou, China, Sept. 1--5, 2014.

[7]

Lei Wang et al. 2014. BigDataBench: A big data benchmark suite from internet services. In Proc. of the 20th IEEE HPCA 2014, Orlando, FL, USA, February 15--19, 2014. IEEE.

[8]

Michael Ferdman et al. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proc. of the 17th ASPLOS 2012, London, UK, March 3--7, 2012.

[9]

Norman Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proc. of the 44th ISCA 2017, Toronto, ON, Canada, June 24--28, 2017.

[10]

Peter Mattson et al. 2019. MLPerf Training Benchmark. CoRR abs/1910.01500 (2019). arXiv:1910.01500

[11]

Vijay Janapa Reddi et al. 2019. MLPerf Inference Benchmark. CoRR abs/1911.02549 (2019). arXiv:1911.02549

[12]

Zijian Ming et al. 2013. BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. In Proc. of the 2013 Workshop on Big Data Benchmarking, Xi'an, China, July 16--17, 2013 and San José, CA, USA, October 9--10, 2013.

[13]

Flink. 2020. flink.apache.org/.

[14]

Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In ICDE 2017, San Diego, CA, USA, April 19--22.

[15]

Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013 (New York, New York, USA). 1197--1208.

Digital Library

[16]

Jim Gray. 1992. Benchmark Handbook: For Database and Transaction Processing Systems. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Digital Library

[17]

Great Responsibility: The 2018 Big Data Great Power and Matt Turck AI Landscape. 2018. http://mattturck.com/bigdata2018/

[18]

Hadoop. 2018. hadoop.apache.org/.

[19]

Rui Han, Lizy Kurian John, and Jianfeng Zhan. 2018. Benchmarking Big Data Systems: A Review. IEEE Trans. Services Computing 11, 3 (2018), 580--597.

[20]

Hive. 2020. hive.apache.org/.

[21]

Han Hu, Yonggang Wen, Tat-Seng Chua, and Xuelong Li. 2014. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access 2 (2014), 652--687.

[22]

ICT, Chinese Academy of Sciences. 2015. BigDataBench 3.1. http://prof.ict.ac. cn/BigDataBench/

[23]

Impala. 2020. impala.apache.org/.

[24]

Intel. 2020. HiBench Suite. https://github.com/intel-hadoop/HiBench

[25]

Todor Ivanov. 2019. Classifying, evaluating and advancing big data benchmarks. Ph.D. Dissertation. Goethe University Frankfurt. http://publikationen.ub.unifrankfurt.de/frontdoor/index/index/docId/51157

[26]

Todor Ivanov, Patrick Bedué, Ahmad Ghazal, and Roberto V. Zicari. 2018. Adding Velocity to BigBench. In Proc. of the 7th DBTest@SIGMOD 2018, Houston, TX, USA, June 15, 2018. ACM, 6:1--6:6.

[27]

Todor Ivanov, Timo Eichhorn, Arne Berre, Tomás Pariente Lobo, Ivan Martinez Rodriguez, Ricardo Ruiz Saiz, Barbara Pernici, and Chiara Francalanci. 2019. Building the DataBench Workflow and Architecture. (2019).

[28]

Todor Ivanov and Rekha Singhal. 2018. ABench: Big Data Architecture Stack Benchmark. In Companion of the 2018 ACM/SPEC ICPE 2018, Berlin, Germany, April 09--13, 2018. ACM.

Digital Library

[29]

C. Kachris, B. Falsafi, and D. Soudris. 2018. Hardware Accelerators in Data Centers. Springer Int.

[30]

Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In Proc. of the 27th ICDE 2011, April 11--16, 2011, Hannover, Germany.

Digital Library

[31]

Doug Laney. 2001. 3D data management: Controlling data volume, velocity and variety. META group research note 6, 70 (2001), 1.

[32]

MADlib. 2020. madlib.apache.org/.

[33]

Min Li. 2015. SparkBench. https://bitbucket.org/lm0926/sparkbench

[34]

MLlib. 2020. spark.apache.org/mllib/.

[35]

Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. 2017. Hybrid Transactional/ Analytical Processing: A Survey. In Proc. of the 2017 ACM SIGMOD 2017, Chicago, IL, USA, May 14--19, 2017.

[36]

Muhammet Mustafa Ozdal. 2018. Emerging Accelerator Platforms for Data Centers. IEEE Design & Test 35, 1 (2018), 47--54.

[37]

Barbara Pernici, Chiara Francalanci, Angela Geronazzo, Polidori Lucia, Ray Stefano, Riva Leonardo, Arne Jørgen Berre, Ivanov Todor, et al. 2018. Relating Big Data Business and Technical Performance Indicators. In Conference of the Italian Chapter of AIS. 1--12.

[38]

Nicolas Poggi. 2018. Microbenchmark. Springer International Publishing, Cham.

[39]

Ivens Portugal, Paulo S. C. Alencar, and Donald D. Cowan. 2018. The use of machine learning algorithms in recommender systems: A systematic review. Expert Syst. Appl. 97 (2018).

[40]

Tilmann Rabl, Christoph Brücke, Philipp Härtling, Rodrigo Escobar Palacios, Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter. 2019. ADABench-Towards an Industry Standard Benchmark for Advanced Analytics. (2019).

[41]

Sherif Sakr, Anna Liu, and Ayman G. Fayoumi. 2013. The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46, 1 (2013), 11:1--11:44.

[42]

Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph Processing on GPUs: A Survey. ACM Comput. Surv. 50, 6 (2018), 81:1--81:35.

[43]

Spark. 2020. spark.apache.org.

[44]

SparkSQL. 2020. spark.apache.org/sql/.

[45]

SPEC. 2019. https://www.spec.org/.

[46]

Tensorflow. 2020. tensorflow.org.

[47]

TPC. 2020. www.tpc.org/.

[48]

Anuj Vaishnav, Khoa Dang Pham, and Dirk Koch. 2018. A Survey on FPGA Virtualization. In Proc. of the 28th FPL 2018, Dublin, Ireland, August 27--31, 2018.

[49]

Kizheppatt Vipin and Suhaib A. Fahmy. 2018. FPGA Dynamic and Partial Reconfiguration: A Survey of Architectures, Methods, and Applications. ACM Comput. Surv. 51, 4 (2018).

[50]

Qingchen Zhang, Laurence T. Yang, Zhikui Chen, and Peng Li. 2018. A survey on deep learning for big data. Information Fusion 42 (2018), 146--157

Cited By

Gallo PKollman JPavlinska JDobrovic J(2024)KPIs and BSC in the SME segment. Myth or reality?Journal of Business Sectors10.62222/YTKL98502:1(1-10)Online publication date: 30-Jun-2024
https://doi.org/10.62222/YTKL9850

Index Terms

Tutorial on Benchmarking Big Data Analytics Systems
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

Big data analytics in Cloud computing: an overview
Abstract
Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable ...
Issues in big data testing and benchmarking
DBTest '13: Proceedings of the Sixth International Workshop on Testing Database Systems

The academic community and industry are currently researching and building next generation data management systems. These systems are designed to analyze data sets of high volume with high data ingest rates and short response times executing complex ...
Big data

We use structuralism and functionalism paradigms to analyze the origins of big data applications.Current trends and sources of big data.Processing technologies, methods and analysis techniques for big data are compared in detail.We analyze major ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPE '20: Companion of the ACM/SPEC International Conference on Performance Engineering

April 2020

65 pages

ISBN:9781450371094

DOI:10.1145/3375555

General Chairs:
J. Nelson Amaral
University of Alberta, Canada
,
Anne Koziolek
Karlruhe Institute of Technology (KIT), Germany
,
Program Chairs:
Catia Trubiani
Gran Sasso Science Institute, GSSI, Italy
,
Alexandru Iosup
VU Amsterdam, Netherlands

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Check for updates

Author Tags

Qualifiers

Tutorial

Funding Sources

European Commission H2020 project DataBench

Conference

ICPE '20

Sponsor:

ICPE '20: ACM/SPEC International Conference on Performance Engineering

April 25 - 30, 2020

Edmonton AB, Canada

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gallo PKollman JPavlinska JDobrovic J(2024)KPIs and BSC in the SME segment. Myth or reality?Journal of Business Sectors10.62222/YTKL98502:1(1-10)Online publication date: 30-Jun-2024
https://doi.org/10.62222/YTKL9850

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents