Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Diagnosing missing events in distributed systems with negative provenance

Published: 17 August 2014 Publication History

Abstract

When debugging a distributed system, it is sometimes necessary to explain the absence of an event - for instance, why a certain route is not available, or why a certain packet did not arrive. Existing debuggers offer some support for explaining the presence of events, usually by providing the equivalent of a backtrace in conventional debuggers, but they are not very good at answering 'Why not?' questions: there is simply no starting point for a possible backtrace.
In this paper, we show that the concept of negative provenance can be used to explain the absence of events in distributed systems. Negative provenance relies on counterfactual reasoning to identify the conditions under which the missing event could have occurred. We define a formal model of negative provenance for distributed systems, and we present the design of a system called Y! that tracks both positive and negative provenance and can use them to answer diagnostic queries. We describe how we have used Y! to debug several realistic problems in two application domains: software-defined networks and BGP interdomain routing. Results from our experimental evaluation show that the overhead of Y! is moderate.

References

[1]
P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In Proc. ICDT, Jan. 2001.
[2]
M. Canini, D. Venzano, P. Peresıni, D. Kostić, and J. Rexford. A NICE way to test OpenFlow applications. In Proc. NSDI, Apr. 2012.
[3]
A. Chapman and H. V. Jagadish. Why not? In Proc. SIGMOD, June 2009.
[4]
D. Erickson. The Beacon OpenFlow controller. In Proc. HotSDN, Aug. 2013.
[5]
A. Feldmann, O. Maennel, Z. M. Mao, A. Berger, and B. Maggs. Locating Internet routing instabilities. In Proc. SIGCOMM, Aug. 2004.
[6]
T. J. Green, G. Karvounarakis, N. E. Taylor, O. Biton, Z. G. Ives, and V. Tannen. ORCHESTRA: Facilitating collaborative data sharing. In Proc. SIGMOD, June 2007.
[7]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. SIGMOD, June 1984.
[8]
N. Handigol, B. Heller, V. Jeyakumar, D. Mazières, and N. McKeown. Where is the debugger for my software-defined network? In Proc. HotSDN, Aug. 2012.
[9]
N. Handigol, B. Heller, V. Jeyakumar, D. Mazières, and N. McKeown. I know what your packet did last hop: Using packet histories to troubleshoot networks. In Proc. NSDI, Apr. 2014.
[10]
J. Huang, T. Chen, A. Doan, and J. F. Naughton. On the provenance of non-answers to queries over extracted data. Proc. VLDB Endow., 1(1):736--747, Aug. 2008.
[11]
R. Ikeda, H. Park, and J. Widom. Provenance for generalized map and reduce workflows. In Proc. CIDR, Jan. 2011.
[12]
E. Katz-Bassett, H. V. Madhyastha, J. P. John, A. Krishnamurthy, D. Wetherall, and T. Anderson. Studying black holes in the Internet with Hubble. In Proc. NSDI, Apr. 2008.
[13]
P. Kazemian, G. Varghese, and N. McKeown. Header space analysis: static checking for networks. In Proc. NSDI, Apr. 2012.
[14]
libspatialindex. http://libspatialindex.github.io/.
[15]
D. Logothetis, S. De, and K. Yocum. Scalable lineage capture for debugging DISC analysis. Technical Report CSE2012-0990, UCSD.
[16]
B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking. Commun. ACM, 52(11):87--95, Nov. 2009.
[17]
P. Macko and M. Seltzer. Provenance Map Orbiter: Interactive exploration of large provenance graphs. In Proc. TaPP, June 2011.
[18]
H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King. Debugging the data plane with Anteater. In Proc. SIGCOMM, Aug. 2011.
[19]
A. Meliou and D. Suciu. Tiresias: the database oracle for how-to queries. In Proc. SIGMOD, May 2012.
[20]
Mininet. http://mininet.org/.
[21]
C. Monsanto, J. Reich, N. Foster, J. Rexford, and D. Walker. Composing software-defined networks. In Proc. NSDI, Apr. 2013.
[22]
K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. Seltzer. Provenance-aware storage systems. In Proc. USENIX ATC, May 2006.
[23]
Outages mailing list. http://wiki.outages.org/index.php/Main_Page#Outages_Mailing_Lists.
[24]
RapidNet. http://netdb.cis.upenn.edu/rapidnet/.
[25]
C. Ré, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In Proc. ICDE, Apr. 2007.
[26]
A. Singh, P. Maniatis, T. Roscoe, and P. Druschel. Using queries for distributed monitoring and forensics. In Proc. EuroSys, Apr. 2006.
[27]
R. Teixeira and J. Rexford. A measurement framework for pin-pointing routing changes. In Proc. Network Troubleshooting workshop (NetTS), Sept. 2004.
[28]
Q. T. Tran and C.-Y. Chan. How to ConQueR why-not questions. In Proc. SIGMOD, June 2010.
[29]
Trema. http://trema.github.io/trema/.
[30]
A. Wang, L. Jia, W. Zhou, Y. Ren, B. T. Loo, J. Rexford, V. Nigam, A. Scedrov, and C. L. Talcott. FSR: Formal analysis and implementation toolkit for safe inter-domain routing. IEEE/ACM ToN, 20(6):1814--1827, Dec. 2012.
[31]
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Proc. CIDR, Jan. 2005.
[32]
Y. Wu, A. Haeberlen, W. Zhou, and B. T. Loo. Answering Why-Not queries in software-defined networks with negative provenance. In Proc. HotNets, 2013.
[33]
Y. Wu, M. Zhao, A. Haeberlen, W. Zhou, and B. T. Loo. Diagnosing missing events in distributed systems with negative provenance. Technical Report MS-CIS-14-06, University of Pennsylvania, 2014.
[34]
H. Zeng, P. Kazemian, G. Varghese, and N. McKeown. Automatic test packet generation. In Proc. CoNEXT, Dec. 2012.
[35]
W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In Proc. SOSP, Oct. 2011.
[36]
W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen, Z. Ives, B. T. Loo, and M. Sherr. Distributed time-aware provenance. In Proc. VLDB, Aug. 2013.
[37]
W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at Internet-scale. In Proc. SIGMOD, 2010.

Cited By

View all
  • (2023)Diagnosing Distributed Routing Configurations Using Sequential Program AnalysisProceedings of the 7th Asia-Pacific Workshop on Networking10.1145/3600061.3600065(34-40)Online publication date: 29-Jun-2023
  • (2023)Aegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud SystemsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00026(222-233)Online publication date: 17-May-2023
  • (2023)DDoS Family: A Novel Perspective for Massive Types of DDoS AttacksComputers & Security10.1016/j.cose.2023.103663(103663)Online publication date: Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGCOMM Computer Communication Review
ACM SIGCOMM Computer Communication Review  Volume 44, Issue 4
SIGCOMM'14
October 2014
672 pages
ISSN:0146-4833
DOI:10.1145/2740070
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2014
Published in SIGCOMM-CCR Volume 44, Issue 4

Check for updates

Author Tags

  1. debugging
  2. diagnostics
  3. provenance

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)25
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Diagnosing Distributed Routing Configurations Using Sequential Program AnalysisProceedings of the 7th Asia-Pacific Workshop on Networking10.1145/3600061.3600065(34-40)Online publication date: 29-Jun-2023
  • (2023)Aegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud SystemsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00026(222-233)Online publication date: 17-May-2023
  • (2023)DDoS Family: A Novel Perspective for Massive Types of DDoS AttacksComputers & Security10.1016/j.cose.2023.103663(103663)Online publication date: Dec-2023
  • (2023)VinciDecoder: Automatically Interpreting Provenance Graphs into Textual Forensic Reports with Application to OpenStackSecure IT Systems10.1007/978-3-031-22295-5_19(346-367)Online publication date: 1-Jan-2023
  • (2022)ErebusProceedings of the VLDB Endowment10.14778/3565816.356582516:2(230-242)Online publication date: 1-Oct-2022
  • (2022)An empirical investigation of missing data handling in cloud node failure predictionProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558946(1453-1464)Online publication date: 7-Nov-2022
  • (2022)On the Reproducibility of Bugs in File-System Aware Storage Applications2022 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS55553.2022.9925445(1-7)Online publication date: Oct-2022
  • (2020)Efficient and Robust Syslog Parsing for Network Devices in Datacenter NetworksIEEE Access10.1109/ACCESS.2020.29726918(30245-30261)Online publication date: 2020
  • (2019)Anomalies Detection and Proactive Defence of Routers Based on Multiple Information LearningEntropy10.3390/e2108073421:8(734)Online publication date: 26-Jul-2019
  • (2019)DAPV: Diagnosing Anomalies in MANETs Routing With Provenance and VerificationIEEE Access10.1109/ACCESS.2019.29031507(35302-35316)Online publication date: 2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media