Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Visualizing Distributed System Executions

Published: 04 March 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Distributed systems pose unique challenges for software developers. Understanding the system’s communication topology and reasoning about concurrent activities of system hosts can be difficult. The standard approach, analyzing system logs, can be a tedious and complex process that involves reconstructing a system log from multiple hosts’ logs, reconciling timestamps among hosts with non-synchronized clocks, and understanding what took place during the execution encoded by the log. This article presents a novel approach for tackling three tasks frequently performed during analysis of distributed system executions: (1) understanding the relative ordering of events, (2) searching for specific patterns of interaction between hosts, and (3) identifying structural similarities and differences between pairs of executions. Our approach consists of XVector, which instruments distributed systems to capture partial ordering information that encodes the happens-before relation between events, and ShiViz, which processes the resulting logs and presents distributed system executions as interactive time-space diagrams. Two user studies with a total of 109 students and a case study with 2 developers showed that our method was effective, helping participants answer statistically significantly more system-comprehension questions correctly, with a very large effect size.

    References

    [1]
    Jenny Abrahamson, Ivan Beschastnikh, Yuriy Brun, and Michael D. Ernst. 2014. Shedding light on distributed system executions. In Proceedings of the International Conference on Software Engineering (ICSE Poster track) (4--6). 598--599.
    [2]
    Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: From usage scenarios to specifications. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE’07).
    [3]
    Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. 2003. Performance debugging for distributed systems of black boxes. SIGOPS OSR 37, 5 (2003), 74--89.
    [4]
    Paulo Sérgio Almeida, Carlos Baquero, and Victor Fonte. 2008. Interval tree clocks. In Proceedings of the International Conference on Principles of Distributed Systems (OPODIS’08). Springer-Verlag, 259--274.
    [5]
    Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using Magpie for request extraction and workload modelling. In Proceedings of the Symposium on Operating Systems Design 8 Implementation (OSDI’04). 259--272.
    [6]
    Daniel Becker, Rolf Rabenseifner, Felix Wolf, and John C. Linford. 2009. Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35, 12 (2009), 595--607.
    [7]
    Omar Benomar, Houari Sahraoui, and Pierre Poulin. 2013. Visualizing software dynamicities with heat maps. In Proceedings of the IEEE Working Conference on Software Visualization (VISSOFT’13).
    [8]
    Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1986. Concurrency Control and Recovery in Database Systems (Chapter 7). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
    [9]
    Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, Michael D. Ernst, and Arvind Krishnamurthy. 2015. Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms. IEEE Trans. Softw. Eng. 41, 4 (Apr. 2015), 408--428.
    [10]
    Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, and Arvind Krishnamurthy. 2014. Inferring models of concurrent systems from logs of their behavior with CSight. In Proceedings of the International Conference on Software Engineering (ICSE’14). 468--479.
    [11]
    Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, Arvind Krishnamurthy, and Thomas E. Anderson. 2012. Mining temporal invariants from partially ordered logs. SIGOPS Oper. Syst. Rev. 45, 3 (Jan. 2012), 39--46.
    [12]
    Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D. Ernst. 2011. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the ACM SIGSOFT Conference on the Foundations of Software Engineering (FSE’11). 267--277.
    [13]
    Ivan Beschastnikh, Perry Liu, Albert Xing, Patty Wang, Yuriy Brun, and Michael D. Ernst. 2017. ShiViz evaluation details. Retrieved from http://bestchai.bitbucket.io/shiviz-evaluation/.
    [14]
    Ivan Beschastnikh, Patty Wang, Yuriy Brun, and Michael D. Ernst. 2016. Debugging distributed systems. Commun. ACM 59, 8 (Aug. 2016), 32--37.
    [15]
    Alan W. Biermann and Jerome A. Feldman. 1972. On the synthesis of finite-state machines from samples of their behavior. IEEE Trans. Comput. 21, 6 (1972), 592--597.
    [16]
    Bully algorithm 2015. Retrieved from http://en.wikipedia.org/wiki/Bully_algorithm.
    [17]
    Haipeng Cai and Douglas Thain. 2016. DistIA: A cost-effective dynamic impact analysis for distributed programs. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’16).
    [18]
    Jacques Chassin de Kergommeaux and Benhur de Oliveira Stein. 2003. Flexible performance visualization of parallel and distributed applications. Fut. Gen. Comput. Syst. 19, 5 (2003), 735--747.
    [19]
    Jacques Chassin de Kergommeaux, Benhur de Oliveira Stein, and Pierre-Eric Bernard. 2000. Pajé, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Comput. 26, 10 (2000), 1253--1274.
    [20]
    Boyuan Chen and Zhen Ming (Jack) Jiang. 2017. Characterizing logging practices in Java-based open source software projects—A replication study in Apache Software Foundation. Empir. Softw. Eng. 22, 1 (Feb. 2017), 330--374.
    [21]
    Mariano C. Consens, Masum Z. Hasan, and Alberto O. Mendelzon. 1993. Debugging distributed programs by visualizing and querying event traces. In Proceedings of the Conference on Applications of Databases, Vol. 819. 181--183.
    [22]
    Jonathan E. Cook and Alexander L. Wolf. 1998. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998).
    [23]
    Charlie Curtsinger and Emery D. Berger. 2015. Coz: Finding code that counts with causal profiling. In Proceedings of the Symposium on Operating Systems Principles (SOSP’15). 184--197.
    [24]
    Wim De Pauw and Henrique Andrade. 2009. Visualizing large-scale streaming applications. Inf. Vis. 8, 2 (Apr. 2009), 87--106.
    [25]
    Wim De Pauw, Henrique Andrade, and Lisa Amini. 2008. StreamSight: A visualization tool for large-scale streaming applications. In Proceedings of the International Symposium on Software Visualization (SoftVis’08). 125--134.
    [26]
    Wim De Pauw and Steve Heisig. 2010. Visual and algorithmic tooling for system trace analysis: A case study. Oper. Syst. Rev. 44, 1 (2010), 97--102.
    [27]
    Wim De Pauw and Steve Heisig. 2010. Zinsight: A visual and analytic environment for exploring large event traces. In Proceedings of the International Symposium on Software Visualization (SoftVis’10). 143--152.
    [28]
    Wim De Pauw, Sophia Krasikov, and John F. Morar. 2006. Execution patterns for visualizing web services. In Proceedings of the International Symposium on Software Visualization (SoftVis’06). 37--45.
    [29]
    Wim De Pauw, Mihai Letia, Bugra Gedik, Henrique Andrade, Andy Frenkiel, Michael Pfeifer, and Daby M. Sow. 2010. Visual debugging for stream processing applications. In Proceedings of the International Conference on Runtime Verification (RV’10). 18--35.
    [30]
    Wim De Pauw and John M. Vlissides. 1998. Visualizing object-oriented programs with Jinsight. In Proceedings of the Workshop Ion on Object-Oriented Technology. 541--542.
    [31]
    Wim De Pauw, Joel L. Wolf, and Andrey Balmin. 2013. Visualizing jobs with shared resources in distributed environments. In Proceedings of the IEEE Working Conference on Software Visualization (VISSOFT’13).
    [32]
    Travis Desell, Harihar Narasimha Iyer, Carlos Varela, and Abe Stephens. 2004. OverView: A framework for generic online visualization of distributed systems. Electron. Notes Theor. Comput. Sci. (Eclipse Technol. Exch.: eTX Eclipse Phenom.) 107 (2004), 87--101. 2004.
    [33]
    Dennis Edwards and Phil Kearns. 1994. DTVS: A distributed trace visualization system. In Proceedings of the Symposium on Parallel and Distributed Processing (IPDPS’94). 281--288.
    [34]
    elasticsearch 2016. Retrieved from https://www.elastic.co/products/elasticsearch.
    [35]
    Dirk Fahland, David Lo, and Shahar Maoz. 2013. Mining branching-time scenarios. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’13).
    [36]
    Colin J. Fidge. 1988. Timestamps in message-passing systems that preserve the partial ordering. In Proceedings of the Australasian Computer Science Conference (ACSC’88). 55--66.
    [37]
    Cormac Flanagan and Patrice Godefroid. 2005. Dynamic partial-order reduction for model checking software. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’05). 110--121.
    [38]
    fluentd 2016. Retrieved from http://www.fluentd.org/.
    [39]
    Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. 2007. X-Trace: A pervasive network tracing framework. In Proceedings of the USENIX Conference on Networked Systems Design 8 Implementation (NSDI’07). 271--284.
    [40]
    Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. 2014. Where do developers log? An empirical study on logging practices in industry. In Proceedings of the International Conference on Software Engineering (ICSE’14).
    [41]
    Mark Gabel and Zhendong Su. 2008. Javert: Fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE’08).
    [42]
    Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An open-source benchmark suite for microservices and their hardware-software implications for cloud 8 edge systems. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19).
    [43]
    Elmer Garduno, Soila P. Kavulya, Jiaqi Tan, Rajeev Gandhi, and Priya Narasimhan. 2012. Theia: Visual signatures for problem diagnosis in large hadoop clusters. In Proceedings of the International Conference on Large Installation System Administration: Strategies, Tools, and Techniques (LISA’12). 33--42.
    [44]
    Dennis Geels, Gautam Altekar, Petros Maniatis, Timothy Roscoe, and Ion Stoica. 2007. Friday: Global comprehension for distributed replay. In Proceedings of the USENIX Conference on Networked Systems Design 8 Implementation (NSDI’07). 21--21.
    [45]
    Carlo Ghezzi, Mauro Pezzè, Michele Sama, and Giordano Tamburrelli. 2014. Mining behavior models from user-intensive web applications. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE’14).
    [46]
    Stewart Grant, Hendrik Cech, and Ivan Beschastnikh. 2018. Inferring and asserting distributed system invariants. In Proceedings of the International Conference on Software Engineering (ICSE’18). 1149--1159.
    [47]
    graylog. 2016. Retrieved from https://www.graylog.org/.
    [48]
    Brendan Gregg. 2016. The flame graph. Commun. ACM 59, 6 (June 2016), 48--57.
    [49]
    Brendan Gregg. 2017. Visualizing performance with flame graphs. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 81--88.
    [50]
    Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-Anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, and Anang D. Satria. 2014. What bugs live in the cloud? A study of 3000+ issues in cloud systems. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’14).
    [51]
    Haryadi S. Gunawi, Mingzhe Hao, Riza O. Suminto, Agung Laksono, Anang D. Satria, Jeffry Adityatama, and Kurnia J. Eliazar. 2016. Why does the cloud stop computing?: Lessons from hundreds of service outages. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’16).
    [52]
    David Harel. 1987. Statecharts: A visual formalism for complex systems. Sci. Comput. Program. 8, 3 (1987), 231--274.
    [53]
    An Huynh, Douglas Thain, Miquel Pericàs, and Kenjiro Taura. 2015. DAGViz: A DAG visualization tool for analyzing task-parallel program traces. In Proceedings of the 2nd Workshop on Visual Performance Analysis (VPA’15). 3:1--3:8.
    [54]
    Katherine E. Isaacs, Abhinav Bhatele, Jonathan Lifflander, David Böhme, Todd Gamblin, Martin Schulz, Bernd Hamann, and Peer-Timo Bremer. 2015. Recovering logical structure from Charm++ event traces. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). 49:1--49:12.
    [55]
    Katherine E. Isaacs, Peer-Timo Bremer, Ilir Jusufi, Todd Gamblin, Abhinav Bhatele, Martin Schulz, and Bernd Hamann. 2014. Combing the communication hairball: Visualizing parallel execution traces using logical time. IEEE Trans. Vis. Comput. Graph. 20, 12 (Dec. 2014), 2349--2358.
    [56]
    Katherine E. Isaacs, Todd Gamblin, Abhinav Bhatele, Martin Schulz, Bernd Hamann, and Peer-Timo Bremer. 2016. Ordering traces logically to identify lateness in message passing programs. IEEE Trans. Parallel Distrib. Syst. 27, 3 (Mar. 2016), 829--840.
    [57]
    Katherine E. Isaacs, Alfredo Giménez, Ilir Jusufi, Todd Gamblin, Abhinav Bhatele, Martin Schulz, Bernd Hamann, and Peer-Timo Bremer. 2014. State-of-the-art of performance visualization. In Proceedings of the Eurographics Conference on Visualization (EuroVis’14).
    [58]
    Hank Jakiela. 1995. Performance visualization of a distributed system: A case study. Computer 28, 11 (Nov. 1995), 30--36.
    [59]
    Kyriakos Karenos, Wim De Pauw, and Hui Lei. 2011. A topic-based visualization tool for distributed publish/subscribe messaging. In Proceedings of the International Symposium on Applications and the Internet (SAINT’11). 65--74.
    [60]
    kibana 2016. Retrieved from https://www.elastic.co/products/kibana.
    [61]
    J. Klensin. 2008. Simple Mail Transfer Protocol. RFC 5321 (Draft Standard). Retrieved from http://www.ietf.org/rfc/rfc5321.txt.
    [62]
    Ivo Krka, Yuriy Brun, and Nenad Medvidovic. 2014. Automatic mining of specifications from invocation traces and method invariants. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14) (16--22). 178--189.
    [63]
    Ivo Krka, Yuriy Brun, Daniel Popescu, Joshua Garcia, and Nenad Medvidovic. 2010. Using dynamic execution traces and program invariants to enhance behavioral model inference. In Proceedings of the International Conference on Software Engineering (ICSE NIER track) (2--8). 179--182.
    [64]
    Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, and David Lo. 2012. Inferring class level specifications for distributed systems. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE’12).
    [65]
    Thomas Kunz, David J. Taylor, and James P. Black. 1997. Poet: Target-system independent visualizations of complex distributed-application executions. Comput. J. 1 (1997), 452--461.
    [66]
    James F. Kurose and Keith W. Ross. 2012. Computer Networking: A Top-down Approach (6th ed.). Pearson.
    [67]
    Fabrizio Lamberti and Gianluca Paravati. 2015. VDHM: Viewport-DOM based heat maps as a tool for visually aggregating web users’ interaction data from mobile and heterogeneous devices. In Proceedings of the IEEE International Conference on Mobile Services (MS’15). 33--40.
    [68]
    Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565.
    [69]
    Aaditya G. Landge, Joshua A. Levine, Katherine E. Isaacs, Abhinav Bhatele, Todd Gamblin, Martin Schulz, Steve H. Langer, Peer-Timo Bremer,andValerio Pascucci. 2012. Visualizing network traffic to understand the performance of massively parallel simulations. IEEE Trans. Vis. Comput. Graph. 18, 12 (Dec. 2012), 2467--2476.
    [70]
    Guillaume Langelier, Houari Sahraoui, and Pierre Poulin. 2008. Exploring the evolution of software quality with animated visualization. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’08). 13--20.
    [71]
    Tien-Duy B. Le, Xuan-Bach D. Le, David Lo, and Ivan Beschastnikh. 2015. Synergizing specification miners through model fissions and fusions. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’15). 115--125.
    [72]
    Kyu Hyung Lee, Nick Sumner, Xiangyu Zhang, and Patrick Eugster. 2011. Unified debugging of distributed systems with Recon. In Proceedings of the IEEE/IFIP 41st International Conference on Dependable Systems 8 Networks (DSN’11). 85--96.
    [73]
    Youn Kyu Lee, Jae Young Bang, Joshua Garcia, and Nenad Medvidovic. 2014. ViVA: A visualization and analysis tool for distributed event-based systems. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE Demo track). 580--583.
    [74]
    David Lo and Shahar Maoz. 2010. Scenario-based and value-based specification mining: Better together. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’10). 387--396.
    [75]
    loggly 2016. Retrieved from https://www.loggly.com/.
    [76]
    logstash 2016. Retrieved from https://www.elastic.co/products/logstash.
    [77]
    Davide Lorenzoli, Leonardo Mariani, and Mauro Pezzè. 2008. Automatic generation of software behavioral models. In Proceedings of the International Conference on Software Engineering (ICSE’08). 501--510.
    [78]
    Stuart Marshall, Kirk Jackson, Craig Anslow, and Robert Biddle. 2003. Aspects to visualising reusable components. In Proceedings of the Asia-Pacific Symposium on Information Visualisation (APVis’03), Vol. 24. 81--88.
    [79]
    Stuart Marshall, Kirk Jackson, Robert Biddle, Michael McGavin, Ewan Tempero, and Matthew Duignan. 2001. Visualising reusable software over the web. In Proceedings of the Asia-Pacific Symposium on Information Visualisation (APVis’01), Vol. 9. 103--111.
    [80]
    Friedemann Mattern. 1989. Virtual time and global states of distributed systems. In Parallel and Distributed Algorithms. North-Holland, 215--226.
    [81]
    mongodb 2016. Retrieved from https://www.mongodb.org/.
    [82]
    F. Neves, N. Machado, and J. Pereira. 2018. Falcon: A practical log-based analysis tool for distributed systems. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’18). 534--541.
    [83]
    Matheus Nunes, Ashaya Sharma Harjeet Lalh, Augustine Wong, Svetozar Miucin, Alexandra Fedorova, and Ivan Beschastnikh. 2017. Studying multi-threaded behavior with TSViz. In Proceedings of the International Conference on Software Engineering (ICSE Demo track).
    [84]
    Tony Ohmann, Michael Herzberg, Sebastian Fiss, Armand Halbert, Marc Palyart, Ivan Beschastnikh, and Yuriy Brun. 2014. Behavioral resource-aware model inference. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’14) (15--19). 19--30.
    [85]
    Adam Oliner, Archana Ganapathi, and Wei Xu. 2012. Advances and challenges in log analysis. Commun. ACM 55, 2 (Feb. 2012), 55--61.
    [86]
    Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference (ATC’14). 305--320.
    [87]
    papertrailapp 2016. Retrieved from https://papertrailapp.com/.
    [88]
    Antonio Pecchia, Marcello Cinque, Gabriella Carrozza, and Domenico Cotroneo. 2015. Industry practices and event logging: Assessment of a critical software development process. In Proceedings of the International Conference on Software Engineering (ICSE’15).
    [89]
    Raphael Pham, Stephan Kiesling, Olga Liskin, Leif Singer, and Kurt Schneider. 2014. Enablers, inhibitors, and perceptions of testing in novice software teams. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14).
    [90]
    Aidi Pi, Wei Chen, Shaoqi Wang, and Xiaobo Zhou. 2019. Semantic-aware workflow construction and analysis for distributed data analytics systems. In Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing (HPDC’19). ACM, New York, NY.
    [91]
    Steven P. Reiss. 1985. PECAN: Program development systems that support multiple views. IEEE Trans. Softw. Eng. 11, 3 (1985), 276--285.
    [92]
    Steven P. Reiss. 1987. Working in the garden environment for conceptual programming. IEEE Softw. 4, 6 (1987), 16--27.
    [93]
    Steven P. Reiss. 1990. Connecting tools using message passing in the field environment. IEEE Softw. 7, 4 (1990), 57--66.
    [94]
    Steven P. Reiss. 1997. Cacti: A front end for program visualization. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis’97). 46--49.
    [95]
    Steven P. Reiss. 1999. The desert environment. ACM Trans. Softw. Eng. Methodol. 8, 4 (1999), 297--342.
    [96]
    Steven P. Reiss. 2001. An overview of BLOOM. In Proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE’01). 2--5.
    [97]
    Steven P. Reiss. 2003. JIVE: Visualizing Java in action demonstration description. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE’03). 820--821.
    [98]
    Steven P. Reiss and Manos Renieris. 2001. Encoding program executions. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE’01). 221--230.
    [99]
    Steven P. Reiss and Manos Renieris. 2005. Demonstration of JIVE and JOVE: Java as it happens. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE Demo Track). 662--663.
    [100]
    Patrick Reynolds, Janet L. Wiener, Jeffrey C. Mogul, Marcos K. Aguilera, and Amin Vahdat. 2006. WAP5: Black-box performance debugging for wide-area systems. In Proceedings of the International Conference on World Wide Web (WWW’06).
    [101]
    Antony I. T. Rowstron and Peter Druschel. 2001. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware’01). 329--350.
    [102]
    Iflaah Salman, Ayse Tosun Misirli, and Natalia Juristo. 2015. Are students representatives of professionals in software engineering experiments? In Proceedings of the International Conference on Software Engineering (ICSE’15).
    [103]
    R. R. Sambasivan, I. Shafer, M. L. Mazurek, and G. R. Ganger. 2013. Visualizing request-flow comparison to aid performance diagnosis in distributed systems. IEEE Trans. Vis. Comput. Graph. 19, 12 (Dec. 2013), 2466--2475.
    [104]
    Shlomo S. Sawilowsky. 2009. New effect size rules of thumb. J. Mod. Appl. Stat. Meth. 8, 2 (2009), 467-474.
    [105]
    Teseo Schneider, Yuriy Tymchuk, Ronie Salgado, and Alexandre Bergel. 2016. CuboidMatrix: Exploring dynamic structural connections in software components using space-time cube. In Proceedings of the IEEE Working Conference on Software Visualization (VISSOFT’16). 116--125.
    [106]
    Matthias Schur, Andreas Roth, and Andreas Zeller. 2013. Mining behavior models from enterprise web applications. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE’13). 422--432.
    [107]
    Colin Scott, Vjekoslav Brajkovic, George Necula, Arvind Krishnamurthy, and Scott Shenker. 2016. Minimizing faulty executions of distributed systems. In Proceedings of the USENIX Conference on Networked Systems Design 8 Implementation (NSDI’16). 291--309.
    [108]
    Colin Scott, Andreas Wundsam, Barath Raghavan, Aurojit Panda, Andrew Or, Jefferson Lai, Eugene Huang, Zhi Liu, Ahmed El-Hassany, Sam Whitlock, H. B. Acharya, Kyriakos Zarifis, and Scott Shenker. 2014. Troubleshooting blackbox SDN control software with minimal causal sequences. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’14). 395--406.
    [109]
    Weiyi Shang, Meiyappan Nagappan, Ahmed E. Hassan, and Zhen Ming Jiang. 2014. Understanding log lines using development knowledge. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME’14). 21--30.
    [110]
    Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc. Retrieved from http://research.google.com/archive/papers/dapper-2010-1.pdf.
    [111]
    splunk 2016. Retrieved from http://www.splunk.com/.
    [112]
    Roshan Sumbaly, Jay Kreps, Lei Gao, Alex Feinberg, Chinmay Soman, and Sam Shah. 2012. Serving large-scale batch computed data with Project Voldemort. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 18--18.
    [113]
    sumologic 2016. Retrieved from http://www.sumologic.com/.
    [114]
    Saeed Taheri, Ian Briggs, Martin Burtscher, and Ganesh Gopalakrishnan. 2019. DiffTrace: Efficient whole-program trace analysis and diffing for debugging. In Proceedings of the International Conference on Cluster Computing. IEEE.
    [115]
    Byung Chul Tak, Chunqiang Tang, Chun Zhang, Sriram Govindan, Bhuvan Urgaonkar, and Rong N. Chang. 2009. vPath: Precise discovery of request processing paths from black-box observations of thread and network activities. In Proceedings of the USENIX Annual Technical Conference (USENIX’09). 19:1--19:14.
    [116]
    Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, and Priya Narasimhan. 2008. SALSA: analyzing logs as state machines. In Proceedings of the 1st USENIX Conference on Analysis of System Logs (WASL’08).
    [117]
    Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, and Priya Narasimhan. 2009. Mochi: Visual log-analysis based tools for debugging hadoop. In Proceedings of the USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’09).
    [118]
    Jonas Trümper, Jürgen Döllner, and Alexandru Telea. 2013. Multiscale visual comparison of execution traces. In Proceedings of the International Conference on Program Comprehension (ICPC’13). 53--62.
    [119]
    Neil Walkinshaw and Kirill Bogdanov. 2008. Inferring finite-state models with temporal constraints. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE’08). 248--257.
    [120]
    Tianyin Xu, Han Min Naing, Le Lu, and Yuanyuan Zhou. 2017. How do system administrators resolve access-denied issues in the real world? In Proceedings of the Conference on Human Factors in Computing Systems (CHI’17).
    [121]
    Tianyin Xu and Yuanyuan Zhou. 2015. Systems approaches to tackling configuration errors: A survey. Comput. Surv. 47, 4 (July 2015), 70:1--70:41.
    [122]
    Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. 2010. Experience mining Google’s production console logs. In Proceedings of the Workshop on Managing Systems via Log Analysis and Machine Learning Techniques (SLAML’10).
    [123]
    Ding Yuan, Soyeon Park, and Yuanyuan Zhou. 2012. Characterizing logging practices in open-source software. In Proceedings of the International Conference on Software Engineering (ICSE’12).
    [124]
    Xu Zhao, Kirk Rodrigues, Yu Luo, Michael Stumm, Ding Yuan, and Yuanyuan Zhou. 2017. Log20: Fully automated optimal placement of log printing statements under specified overhead threshold. In Proceedings of the Symposium on Operating Systems Principles (SOSP’17).
    [125]
    Zipkin 2016. Retrieved from http://zipkin.io/.

    Cited By

    View all
    • (2024)VAMP: Visual Analytics for Microservices PerformanceProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636069(1209-1218)Online publication date: 8-Apr-2024
    • (2024)A Qualitative Interview Study of Distributed Tracing Visualisation: A Characterisation of Challenges and OpportunitiesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324159630:7(3828-3840)Online publication date: Jul-2024
    • (2024)ServiceAnomalyJournal of Systems and Software10.1016/j.jss.2023.111917209:COnline publication date: 14-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 29, Issue 2
    April 2020
    200 pages
    ISSN:1049-331X
    EISSN:1557-7392
    DOI:10.1145/3386453
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 March 2020
    Accepted: 01 November 2019
    Revised: 01 September 2019
    Received: 01 October 2018
    Published in TOSEM Volume 29, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Distributed systems
    2. log analysis
    3. program comprehension

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)554
    • Downloads (Last 6 weeks)75
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)VAMP: Visual Analytics for Microservices PerformanceProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636069(1209-1218)Online publication date: 8-Apr-2024
    • (2024)A Qualitative Interview Study of Distributed Tracing Visualisation: A Characterisation of Challenges and OpportunitiesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324159630:7(3828-3840)Online publication date: Jul-2024
    • (2024)ServiceAnomalyJournal of Systems and Software10.1016/j.jss.2023.111917209:COnline publication date: 14-Mar-2024
    • (2024)Towards Serverless & Microservices Architecture: Strategies, Challenges, and Insights into TechnologyArtificial Intelligence and Economic Sustainability in the Era of Industrial Revolution 5.010.1007/978-3-031-56586-1_33(447-458)Online publication date: 29-May-2024
    • (2023)Adonis: Practical and Efficient Control Flow Recovery through OS-level TracesACM Transactions on Software Engineering and Methodology10.1145/360718733:1(1-27)Online publication date: 4-Jul-2023
    • (2023)Enhancing Trace Visualizations for Microservices Performance AnalysisCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584729(283-287)Online publication date: 15-Apr-2023
    • (2023)Greybox Fuzzing of Distributed SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623097(1615-1629)Online publication date: 15-Nov-2023
    • (2023)Compiling Distributed System Models with PGoProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575695(159-175)Online publication date: 27-Jan-2023
    • (2023)Blindspots in Python and Java APIs Result in Vulnerable CodeACM Transactions on Software Engineering and Methodology10.1145/357185032:3(1-31)Online publication date: 26-Apr-2023
    • (2023)Visualizing Kubernetes Distributed Systems: An Exploratory Study2023 IEEE Working Conference on Software Visualization (VISSOFT)10.1109/VISSOFT60811.2023.00011(12-22)Online publication date: 1-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media