research-article

Profiling dataflow systems on multiple abstraction levels

Authors:

Alexander Beischl,

Maximilian Bandle,

Thomas NeumannAuthors Info & Claims

EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

Pages 474 - 489

https://doi.org/10.1145/3447786.3456254

Published: 21 April 2021 Publication History

Abstract

Dataflow graphs are a popular abstraction for describing computation, used in many systems for high-level optimization. For execution, dataflow graphs are lowered and optimized through layers of program representations down to machine instructions. Unfortunately, performance profiling such systems is cumbersome, as today's profilers present results merely at instruction and function granularity. This obfuscates the connection between profiles and high-level constructs, such as operators and pipelines, making interpretation of profiles an exercise in puzzling and deduction.

In this paper, we show how to profile compiling dataflow systems at higher abstraction levels. Our approach tracks the code generation process and aggregates profiling data to any abstraction level. This bridges the semantic gap to match the engineer's current information need and even creates a comprehensible way to report timing information within profiling data. We have evaluated this approach within our compiling DBMS Umbra, showing that the approach is generally applicable for compiling dataflow systems and can be implemented with high accuracy and reasonable overhead.

References

[1]

2019. OProfile. https://oprofile.sourceforge.io/.

[2]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI 2016, Savannah, GA, USA, November 2-4, 2016. USENIX Association, 265--283.

[3]

Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701.

[4]

Soramichi Akiyama and Takahiro Hirofuchi. 2017. Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis. In ROSS@HPDC 2017, Washingon, DC, DC, USA, June 27 - 27, 2017. ACM, 3:1--3:8.

[5]

Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives (Eds.). ACM, 1383--1394.

Digital Library

[6]

Michael D. Bond, Graham Z. Baker, and Samuel Z. Guyer. 2010. Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada, June 5-10, 2010, Benjamin G. Zorn and Alexander Aiken (Eds.). ACM, 13--24.

Digital Library

[7]

Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.

[8]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. USENIX Association, 578--594.

[9]

GDB developers. 2020. GDB: The GNU Project Debugger. https://www.gnu.org/software/gdb/

[10]

Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, and Nicholas Bambos. 2016. Reliable and efficient performance monitoring in linux. In SC 2016, Salt Lake City, UT, USA, November 13-18, 2016. IEEE Computer Society, 396--408.

[11]

Michael J. Eager. 2012. Introduction to the DWARF Debugging Format. http://www.dwarfstd.org/doc/Debugging%20using%20DWARF2012.pdf

[12]

Stéphane Eranian. 2019. Linux perf_events updates. Scalable Tools Workshop 19.

[13]

Panagiotis Garefalakis. 2020. Supporting long-running applications in shared compute clusters. Ph.D. Dissertation. Imperial College London.

[14]

Brendan D. Gregg. 2019. Flame Graphs. http://www.brendangregg.com/flamegraphs.html.

[15]

Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun. 2012. Green-Marl: a DSL for easy and efficient graph analysis. In ASPLOS 2012, London, UK, March 3-7, 2012. ACM, 349--362.

Digital Library

[16]

Intel. 2019. Intel 64 and IA-32 Architectures Optimization Reference Manual. https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf.

[17]

Intel. 2020. Intel 64 and IA-32 Architectures Software Developer Manuals. https://software.intel.com/en-us/articles/intel-sdm.

[18]

Intel. 2020. Intel VTune Profiler. https://software.intel.com/en-us/vtune.

[19]

Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 195--206.

Digital Library

[20]

Timo Kersten and Thomas Neumann. 2020. On another level: how to debug compiling query engines. In Proceedings of the 8th International Workshop on Testing Database Systems, DBTest@SIGMOD 2020, Portland, Oregon, June 19, 2020. ACM, 2:1--2:6.

Digital Library

[21]

Andi Kleen. 2020. pmu tools. https://github.com/andikleen/pmu-tools.

[22]

Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. 75--88.

Digital Library

[23]

Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore's Law. CoRR (2020).

[24]

Xiangqi Li and Matthew Flatt. 2017. Debugging with domain-specific events via macros. In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2017, Vancouver, BC, Canada, October 23-24, 2017, Benoît Combemale, Marjan Mernik, and Bernhard Rumpe (Eds.). ACM, 91--102.

Digital Library

[25]

Linux. 2020. Linux perf. https://github.com/torvalds/linux/tree/master/tools/perf.

[26]

Linux. 2020. perf_event_open(2). http://man7.org/linux/man-pages/man2/perf_event_open.2.html.

[27]

Xunyun Liu and Rajkumar Buyya. 2020. Resource Management and Scheduling in Distributed Stream Processing Systems: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 53, 3 (2020), 50:1--50:41.

Digital Library

[28]

Chi-Keung Luk, Robert S. Cohn, Robert Muth, Harish Patil, Artur Klauser, P. Geoffrey Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim M. Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. In SIGPLAN '05, Chicago, IL, USA, June 12-15, 2005. ACM, 190--200.

[29]

Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings.

[30]

Prashanth Menon, Andrew Pavlo, and Todd C. Mowry. 2017. Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last. Proc. VLDB Endow. 11, 1 (2017), 1--13.

Digital Library

[31]

Guido Moerkotte and Thomas Neumann. 2011. Accelerating Queries with Group-By and Join by Groupjoin. Proc. VLDB Endow. 4, 11 (2011), 843--851. http://www.vldb.org/pvldb/vol4/p843-moerkotte.pdf

Digital Library

[32]

Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In SOSP 13, Farmington, PA, USA, November 3-6, 2013. 439--455.

Digital Library

[33]

Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539--550.

Digital Library

[34]

Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings.

[35]

Stefan Noll, Jens Teubner, Norman May, and Alexander Böhm. 2020. Analyzing memory accesses with modern processors. In DaMoN 2020, Portland, Oregon, USA, June 15, 2020. 1:1--1:9.

Digital Library

[36]

Aleix Roca Nonell, Balazs Gerofi, Leonardo Bautista-Gomez, Dominique Martinet, Vicenç Beltran Querol, and Yutaka Ishikawa. 2018. On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale. In MCHPC@SC 2018, Dallas, TX, USA, November 11, 2018. ACM, 50--57.

[37]

Andrzej Nowak, Ahmad Yasin, Avi Mendelson, and Willy Zwaenepoel. 2015. Establishing a Base of Trust with Performance Counters for Enterprise Workloads. In USENIX ATC '15, July 8-10, Santa Clara, CA, USA. USENIX Association, 541--548.

[38]

Trail of Bits. 2019. A tsc_freq_khz Driver for Everyone. https://github.com/trailofbits/tsc_freq_khz.

[39]

Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A common runtime for high performance data analytics. In CIDR '17. 45.

[40]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 8024--8035.

[41]

Aleksey Pesterev, Nickolai Zeldovich, and Robert Tappan Morris. 2010. Locating cache performance bottlenecks using data profiling. In EuroSys 2010, Paris, France, April 13-16, 2010. ACM, 335--348.

Digital Library

[42]

Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware. Proc. VLDB Endow. 9, 14 (2016), 1707--1718.

Digital Library

[43]

Malte Schwarzkopf. 2020. The Remarkable Utility of Dataflow Computing. https://www.sigops.org/2020/the-remarkable-utility-of-dataflow-computing/

[44]

Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In SIGPLAN, PPoPP, Shenzhen, China, February 23-27, 2013. 135--146.

[45]

Christian Stuart. 2020. Profiling Compiled SQL Query Pipelines in Apache Spark. Master's thesis. Universiteit van Amsterdam.

[46]

Google XLA team. 2017. XLA - TensorFlow, compiled. https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html.

[47]

LLDB Team. 2007. The LLDB Debugger. https://lldb.llvm.org

[48]

Pinar Tözün, Brian Gold, and Anastasia Ailamaki. 2013. OLTP in wonderland: where do cache misses come from in major OLTP components?. In DaMoN 2013, New York, NY, USA, June 24, 2013. ACM, 8.

[49]

Transaction Processing Performance Council (TPC). 1993--2018. TPC BENCHMARK™ H (Decision Support) - Standard Specification Revision 2.18.0.

[50]

Milian Wolff. 2020. Hotspot - the Linux perf GUI for performance analysis. https://github.com/KDAB/hotspot.

[51]

Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.

Digital Library

Cited By

Wagner BKohn ABoncz PLeis V(2024)Incremental Fusion: Unifying Compiled and Vectorized Query Execution2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00042(462-474)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00042
Skordalakis EAttwood AGoodacre JLuján M(2024)AccProf: Increasing the Accuracy of Embedded Application Profiling Using FPGAsArchitecture of Computing Systems10.1007/978-3-031-66146-4_13(192-206)Online publication date: 1-Aug-2024
https://doi.org/10.1007/978-3-031-66146-4_13
Anneser CVogel LGruber FBandle MGiceva JBaumann ACrooks NSchwarzkopf M(2023)Programming Fully Disaggregated SystemsProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595889(188-195)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595889
Show More Cited By

Index Terms

Profiling dataflow systems on multiple abstraction levels
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Software architectures
        Data flow architectures

Recommendations

Listener latency profiling: Measuring the perceptible performance of interactive Java applications

When developers need to improve the performance of their applications, they usually use one of the many existing profilers. These profilers generally capture a profile that represents the execution time spent in each method. The developer can thus focus ...
Value profiling
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, which cannot be labeled as constants, may exhibit semi-...
Dataflow Virtual Machine Profiling
SBAC-PADW '14: Proceedings of the 2014 International Symposium on Computer Architecture and High Performance Computing Workshop

In the Dataflow model instructions are executed as soon as their input operands are ready, allowing the natural exploitation of instruction level parallelism (ILP), which makes it extremely useful for increasing applications' performance on multicore ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

April 2021

631 pages

ISBN:9781450383349

DOI:10.1145/3447786

General Chairs:
Antonio Barbalace
The University of Edinburgh
,
Pramod Bhatotia
Technical University of Munich
,
Program Chairs:
Lorenzo Alvisi
Cornell University
,
Cristian Cadar
Imperial College London

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Research Council (ERC)

Conference

EuroSys '21

Sponsor:

SIGOPS

EuroSys '21: Sixteenth European Conference on Computer Systems

April 26 - 28, 2021

Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
385
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wagner BKohn ABoncz PLeis V(2024)Incremental Fusion: Unifying Compiled and Vectorized Query Execution2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00042(462-474)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00042
Skordalakis EAttwood AGoodacre JLuján M(2024)AccProf: Increasing the Accuracy of Embedded Application Profiling Using FPGAsArchitecture of Computing Systems10.1007/978-3-031-66146-4_13(192-206)Online publication date: 1-Aug-2024
https://doi.org/10.1007/978-3-031-66146-4_13
Anneser CVogel LGruber FBandle MGiceva JBaumann ACrooks NSchwarzkopf M(2023)Programming Fully Disaggregated SystemsProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595889(188-195)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595889
Yang JBäuerle AMoritz DDemiralp Ç(2023)VegaProf: Profiling Vega VisualizationsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606790(1-11)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606790
Jungmair MKohn AGiceva J(2022)Designing an open framework for query optimization and compilationProceedings of the VLDB Endowment10.14778/3551793.355180115:11(2389-2401)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551801
Fent PBirler ANeumann T(2022)Practical planning and execution of groupjoin and nested aggregatesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00765-x32:6(1165-1190)Online publication date: 22-Oct-2022
https://dl.acm.org/doi/10.1007/s00778-022-00765-x
Neumann T(2021)Evolution of a compiling query engineProceedings of the VLDB Endowment10.14778/3476311.347641014:12(3207-3210)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.14778/3476311.3476410

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents