Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3375555.3383121acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
tutorial

Tutorial on Benchmarking Big Data Analytics Systems

Published: 20 April 2020 Publication History

Abstract

The proliferation of big data technology and faster computing systems led to pervasions of AI based solutions in our life. There is need to understand how to benchmark systems used to build AI based solutions that have a complex pipeline of pre-processing, statistical analysis, machine learning and deep learning on data to build prediction models. Solution architects, engineers and researchers may use open-source technology or proprietary systems based on desired performance requirements. The performance metrics may be data pre-processing time, model training time and model inference time. We do not see a single benchmark answering all questions of solution architects and researchers. This tutorial covers both practical and research questions on relevant Big Data and Analytics benchmarks.

References

[1]
Faraz Ahmad. 2012. PUMA Benchmarks. https://engineering.purdue.edu/~puma/ pumabenchmarks.htm
[2]
AsterixDB. 2020. asterixdb.apache.org.
[3]
Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2 (2019), 423--443.
[4]
Jalil Boukhobza, Stéphane Rubini, Renhai Chen, and Zili Shao. 2018. Emerging NVM: A Survey on Architectural Integration and Research Challenges. ACM Trans. Design Autom. Electr. Syst. 23, 2 (2018), 14:1--14:32.
[5]
Yanpei Chen et al. 2012. We don't know enough to make a big data benchmark suite-an academia-industry view. Proc. of WBDB (2012).
[6]
Chaitanya K. Baru et al. 2014. Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data. In Proc. of the 6th TPCTC 2014, Hangzhou, China, Sept. 1--5, 2014.
[7]
Lei Wang et al. 2014. BigDataBench: A big data benchmark suite from internet services. In Proc. of the 20th IEEE HPCA 2014, Orlando, FL, USA, February 15--19, 2014. IEEE.
[8]
Michael Ferdman et al. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proc. of the 17th ASPLOS 2012, London, UK, March 3--7, 2012.
[9]
Norman Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proc. of the 44th ISCA 2017, Toronto, ON, Canada, June 24--28, 2017.
[10]
Peter Mattson et al. 2019. MLPerf Training Benchmark. CoRR abs/1910.01500 (2019). arXiv:1910.01500
[11]
Vijay Janapa Reddi et al. 2019. MLPerf Inference Benchmark. CoRR abs/1911.02549 (2019). arXiv:1911.02549
[12]
Zijian Ming et al. 2013. BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. In Proc. of the 2013 Workshop on Big Data Benchmarking, Xi'an, China, July 16--17, 2013 and San José, CA, USA, October 9--10, 2013.
[13]
Flink. 2020. flink.apache.org/.
[14]
Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In ICDE 2017, San Diego, CA, USA, April 19--22.
[15]
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013 (New York, New York, USA). 1197--1208.
[16]
Jim Gray. 1992. Benchmark Handbook: For Database and Transaction Processing Systems. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[17]
Great Responsibility: The 2018 Big Data Great Power and Matt Turck AI Landscape. 2018. http://mattturck.com/bigdata2018/
[18]
Hadoop. 2018. hadoop.apache.org/.
[19]
Rui Han, Lizy Kurian John, and Jianfeng Zhan. 2018. Benchmarking Big Data Systems: A Review. IEEE Trans. Services Computing 11, 3 (2018), 580--597.
[20]
Hive. 2020. hive.apache.org/.
[21]
Han Hu, Yonggang Wen, Tat-Seng Chua, and Xuelong Li. 2014. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access 2 (2014), 652--687.
[22]
ICT, Chinese Academy of Sciences. 2015. BigDataBench 3.1. http://prof.ict.ac. cn/BigDataBench/
[23]
Impala. 2020. impala.apache.org/.
[24]
Intel. 2020. HiBench Suite. https://github.com/intel-hadoop/HiBench
[25]
Todor Ivanov. 2019. Classifying, evaluating and advancing big data benchmarks. Ph.D. Dissertation. Goethe University Frankfurt. http://publikationen.ub.unifrankfurt.de/frontdoor/index/index/docId/51157
[26]
Todor Ivanov, Patrick Bedué, Ahmad Ghazal, and Roberto V. Zicari. 2018. Adding Velocity to BigBench. In Proc. of the 7th DBTest@SIGMOD 2018, Houston, TX, USA, June 15, 2018. ACM, 6:1--6:6.
[27]
Todor Ivanov, Timo Eichhorn, Arne Berre, Tomás Pariente Lobo, Ivan Martinez Rodriguez, Ricardo Ruiz Saiz, Barbara Pernici, and Chiara Francalanci. 2019. Building the DataBench Workflow and Architecture. (2019).
[28]
Todor Ivanov and Rekha Singhal. 2018. ABench: Big Data Architecture Stack Benchmark. In Companion of the 2018 ACM/SPEC ICPE 2018, Berlin, Germany, April 09--13, 2018. ACM.
[29]
C. Kachris, B. Falsafi, and D. Soudris. 2018. Hardware Accelerators in Data Centers. Springer Int.
[30]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In Proc. of the 27th ICDE 2011, April 11--16, 2011, Hannover, Germany.
[31]
Doug Laney. 2001. 3D data management: Controlling data volume, velocity and variety. META group research note 6, 70 (2001), 1.
[32]
MADlib. 2020. madlib.apache.org/.
[33]
Min Li. 2015. SparkBench. https://bitbucket.org/lm0926/sparkbench
[34]
MLlib. 2020. spark.apache.org/mllib/.
[35]
Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. 2017. Hybrid Transactional/ Analytical Processing: A Survey. In Proc. of the 2017 ACM SIGMOD 2017, Chicago, IL, USA, May 14--19, 2017.
[36]
Muhammet Mustafa Ozdal. 2018. Emerging Accelerator Platforms for Data Centers. IEEE Design & Test 35, 1 (2018), 47--54.
[37]
Barbara Pernici, Chiara Francalanci, Angela Geronazzo, Polidori Lucia, Ray Stefano, Riva Leonardo, Arne Jørgen Berre, Ivanov Todor, et al. 2018. Relating Big Data Business and Technical Performance Indicators. In Conference of the Italian Chapter of AIS. 1--12.
[38]
Nicolas Poggi. 2018. Microbenchmark. Springer International Publishing, Cham.
[39]
Ivens Portugal, Paulo S. C. Alencar, and Donald D. Cowan. 2018. The use of machine learning algorithms in recommender systems: A systematic review. Expert Syst. Appl. 97 (2018).
[40]
Tilmann Rabl, Christoph Brücke, Philipp Härtling, Rodrigo Escobar Palacios, Hamesh Patel, Satyam Srivastava, Christoph Boden, Jens Meiners, and Sebastian Schelter. 2019. ADABench-Towards an Industry Standard Benchmark for Advanced Analytics. (2019).
[41]
Sherif Sakr, Anna Liu, and Ayman G. Fayoumi. 2013. The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46, 1 (2013), 11:1--11:44.
[42]
Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph Processing on GPUs: A Survey. ACM Comput. Surv. 50, 6 (2018), 81:1--81:35.
[43]
Spark. 2020. spark.apache.org.
[44]
SparkSQL. 2020. spark.apache.org/sql/.
[45]
SPEC. 2019. https://www.spec.org/.
[46]
Tensorflow. 2020. tensorflow.org.
[47]
TPC. 2020. www.tpc.org/.
[48]
Anuj Vaishnav, Khoa Dang Pham, and Dirk Koch. 2018. A Survey on FPGA Virtualization. In Proc. of the 28th FPL 2018, Dublin, Ireland, August 27--31, 2018.
[49]
Kizheppatt Vipin and Suhaib A. Fahmy. 2018. FPGA Dynamic and Partial Reconfiguration: A Survey of Architectures, Methods, and Applications. ACM Comput. Surv. 51, 4 (2018).
[50]
Qingchen Zhang, Laurence T. Yang, Zhikui Chen, and Peng Li. 2018. A survey on deep learning for big data. Information Fusion 42 (2018), 146--157

Cited By

View all
  • (2024)KPIs and BSC in the SME segment. Myth or reality?Journal of Business Sectors10.62222/YTKL98502:1(1-10)Online publication date: 30-Jun-2024

Index Terms

  1. Tutorial on Benchmarking Big Data Analytics Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPE '20: Companion of the ACM/SPEC International Conference on Performance Engineering
    April 2020
    65 pages
    ISBN:9781450371094
    DOI:10.1145/3375555
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2020

    Check for updates

    Author Tags

    1. AI
    2. ML
    3. analytics
    4. benchmarking
    5. big data

    Qualifiers

    • Tutorial

    Funding Sources

    • European Commission H2020 project DataBench

    Conference

    ICPE '20

    Acceptance Rates

    Overall Acceptance Rate 252 of 851 submissions, 30%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)KPIs and BSC in the SME segment. Myth or reality?Journal of Business Sectors10.62222/YTKL98502:1(1-10)Online publication date: 30-Jun-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media