Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447786.3456254acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Profiling dataflow systems on multiple abstraction levels

Published: 21 April 2021 Publication History

Abstract

Dataflow graphs are a popular abstraction for describing computation, used in many systems for high-level optimization. For execution, dataflow graphs are lowered and optimized through layers of program representations down to machine instructions. Unfortunately, performance profiling such systems is cumbersome, as today's profilers present results merely at instruction and function granularity. This obfuscates the connection between profiles and high-level constructs, such as operators and pipelines, making interpretation of profiles an exercise in puzzling and deduction.
In this paper, we show how to profile compiling dataflow systems at higher abstraction levels. Our approach tracks the code generation process and aggregates profiling data to any abstraction level. This bridges the semantic gap to match the engineer's current information need and even creates a comprehensible way to report timing information within profiling data. We have evaluated this approach within our compiling DBMS Umbra, showing that the approach is generally applicable for compiling dataflow systems and can be implemented with high accuracy and reasonable overhead.

References

[1]
2019. OProfile. https://oprofile.sourceforge.io/.
[2]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI 2016, Savannah, GA, USA, November 2-4, 2016. USENIX Association, 265--283.
[3]
Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701.
[4]
Soramichi Akiyama and Takahiro Hirofuchi. 2017. Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis. In ROSS@HPDC 2017, Washingon, DC, DC, USA, June 27 - 27, 2017. ACM, 3:1--3:8.
[5]
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives (Eds.). ACM, 1383--1394.
[6]
Michael D. Bond, Graham Z. Baker, and Samuel Z. Guyer. 2010. Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada, June 5-10, 2010, Benjamin G. Zorn and Alexander Aiken (Eds.). ACM, 13--24.
[7]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.
[8]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. USENIX Association, 578--594.
[9]
GDB developers. 2020. GDB: The GNU Project Debugger. https://www.gnu.org/software/gdb/
[10]
Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, and Nicholas Bambos. 2016. Reliable and efficient performance monitoring in linux. In SC 2016, Salt Lake City, UT, USA, November 13-18, 2016. IEEE Computer Society, 396--408.
[11]
Michael J. Eager. 2012. Introduction to the DWARF Debugging Format. http://www.dwarfstd.org/doc/Debugging%20using%20DWARF2012.pdf
[12]
Stéphane Eranian. 2019. Linux perf_events updates. Scalable Tools Workshop 19.
[13]
Panagiotis Garefalakis. 2020. Supporting long-running applications in shared compute clusters. Ph.D. Dissertation. Imperial College London.
[14]
Brendan D. Gregg. 2019. Flame Graphs. http://www.brendangregg.com/flamegraphs.html.
[15]
Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun. 2012. Green-Marl: a DSL for easy and efficient graph analysis. In ASPLOS 2012, London, UK, March 3-7, 2012. ACM, 349--362.
[16]
Intel. 2019. Intel 64 and IA-32 Architectures Optimization Reference Manual. https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf.
[17]
Intel. 2020. Intel 64 and IA-32 Architectures Software Developer Manuals. https://software.intel.com/en-us/articles/intel-sdm.
[18]
Intel. 2020. Intel VTune Profiler. https://software.intel.com/en-us/vtune.
[19]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 195--206.
[20]
Timo Kersten and Thomas Neumann. 2020. On another level: how to debug compiling query engines. In Proceedings of the 8th International Workshop on Testing Database Systems, DBTest@SIGMOD 2020, Portland, Oregon, June 19, 2020. ACM, 2:1--2:6.
[21]
Andi Kleen. 2020. pmu tools. https://github.com/andikleen/pmu-tools.
[22]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. 75--88.
[23]
Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore's Law. CoRR (2020).
[24]
Xiangqi Li and Matthew Flatt. 2017. Debugging with domain-specific events via macros. In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2017, Vancouver, BC, Canada, October 23-24, 2017, Benoît Combemale, Marjan Mernik, and Bernhard Rumpe (Eds.). ACM, 91--102.
[25]
Linux. 2020. Linux perf. https://github.com/torvalds/linux/tree/master/tools/perf.
[26]
Linux. 2020. perf_event_open(2). http://man7.org/linux/man-pages/man2/perf_event_open.2.html.
[27]
Xunyun Liu and Rajkumar Buyya. 2020. Resource Management and Scheduling in Distributed Stream Processing Systems: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 53, 3 (2020), 50:1--50:41.
[28]
Chi-Keung Luk, Robert S. Cohn, Robert Muth, Harish Patil, Artur Klauser, P. Geoffrey Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim M. Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. In SIGPLAN '05, Chicago, IL, USA, June 12-15, 2005. ACM, 190--200.
[29]
Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings.
[30]
Prashanth Menon, Andrew Pavlo, and Todd C. Mowry. 2017. Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last. Proc. VLDB Endow. 11, 1 (2017), 1--13.
[31]
Guido Moerkotte and Thomas Neumann. 2011. Accelerating Queries with Group-By and Join by Groupjoin. Proc. VLDB Endow. 4, 11 (2011), 843--851. http://www.vldb.org/pvldb/vol4/p843-moerkotte.pdf
[32]
Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In SOSP 13, Farmington, PA, USA, November 3-6, 2013. 439--455.
[33]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539--550.
[34]
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings.
[35]
Stefan Noll, Jens Teubner, Norman May, and Alexander Böhm. 2020. Analyzing memory accesses with modern processors. In DaMoN 2020, Portland, Oregon, USA, June 15, 2020. 1:1--1:9.
[36]
Aleix Roca Nonell, Balazs Gerofi, Leonardo Bautista-Gomez, Dominique Martinet, Vicenç Beltran Querol, and Yutaka Ishikawa. 2018. On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale. In MCHPC@SC 2018, Dallas, TX, USA, November 11, 2018. ACM, 50--57.
[37]
Andrzej Nowak, Ahmad Yasin, Avi Mendelson, and Willy Zwaenepoel. 2015. Establishing a Base of Trust with Performance Counters for Enterprise Workloads. In USENIX ATC '15, July 8-10, Santa Clara, CA, USA. USENIX Association, 541--548.
[38]
Trail of Bits. 2019. A tsc_freq_khz Driver for Everyone. https://github.com/trailofbits/tsc_freq_khz.
[39]
Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A common runtime for high performance data analytics. In CIDR '17. 45.
[40]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 8024--8035.
[41]
Aleksey Pesterev, Nickolai Zeldovich, and Robert Tappan Morris. 2010. Locating cache performance bottlenecks using data profiling. In EuroSys 2010, Paris, France, April 13-16, 2010. ACM, 335--348.
[42]
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware. Proc. VLDB Endow. 9, 14 (2016), 1707--1718.
[43]
Malte Schwarzkopf. 2020. The Remarkable Utility of Dataflow Computing. https://www.sigops.org/2020/the-remarkable-utility-of-dataflow-computing/
[44]
Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In SIGPLAN, PPoPP, Shenzhen, China, February 23-27, 2013. 135--146.
[45]
Christian Stuart. 2020. Profiling Compiled SQL Query Pipelines in Apache Spark. Master's thesis. Universiteit van Amsterdam.
[46]
Google XLA team. 2017. XLA - TensorFlow, compiled. https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html.
[47]
LLDB Team. 2007. The LLDB Debugger. https://lldb.llvm.org
[48]
Pinar Tözün, Brian Gold, and Anastasia Ailamaki. 2013. OLTP in wonderland: where do cache misses come from in major OLTP components?. In DaMoN 2013, New York, NY, USA, June 24, 2013. ACM, 8.
[49]
Transaction Processing Performance Council (TPC). 1993--2018. TPC BENCHMARK™ H (Decision Support) - Standard Specification Revision 2.18.0.
[50]
Milian Wolff. 2020. Hotspot - the Linux perf GUI for performance analysis. https://github.com/KDAB/hotspot.
[51]
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.

Cited By

View all
  • (2024)Incremental Fusion: Unifying Compiled and Vectorized Query Execution2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00042(462-474)Online publication date: 13-May-2024
  • (2024)AccProf: Increasing the Accuracy of Embedded Application Profiling Using FPGAsArchitecture of Computing Systems10.1007/978-3-031-66146-4_13(192-206)Online publication date: 1-Aug-2024
  • (2023)Programming Fully Disaggregated SystemsProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595889(188-195)Online publication date: 22-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
April 2021
631 pages
ISBN:9781450383349
DOI:10.1145/3447786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataflow systems
  2. profiling
  3. query compilation

Qualifiers

  • Research-article

Funding Sources

  • European Research Council (ERC)

Conference

EuroSys '21
Sponsor:
EuroSys '21: Sixteenth European Conference on Computer Systems
April 26 - 28, 2021
Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Incremental Fusion: Unifying Compiled and Vectorized Query Execution2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00042(462-474)Online publication date: 13-May-2024
  • (2024)AccProf: Increasing the Accuracy of Embedded Application Profiling Using FPGAsArchitecture of Computing Systems10.1007/978-3-031-66146-4_13(192-206)Online publication date: 1-Aug-2024
  • (2023)Programming Fully Disaggregated SystemsProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595889(188-195)Online publication date: 22-Jun-2023
  • (2023)VegaProf: Profiling Vega VisualizationsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606790(1-11)Online publication date: 29-Oct-2023
  • (2022)Designing an open framework for query optimization and compilationProceedings of the VLDB Endowment10.14778/3551793.355180115:11(2389-2401)Online publication date: 1-Jul-2022
  • (2022)Practical planning and execution of groupjoin and nested aggregatesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00765-x32:6(1165-1190)Online publication date: 22-Oct-2022
  • (2021)Evolution of a compiling query engineProceedings of the VLDB Endowment10.14778/3476311.347641014:12(3207-3210)Online publication date: 1-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media