Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3131365.3131384acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article

Pinpointing delay and forwarding anomalies using large-scale traceroute measurements

Published: 01 November 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Understanding data plane health is essential to improving Internet reliability and usability. For instance, detecting disruptions in distant networks can identify repairable connectivity problems. Currently this task is difficult and time consuming as operators have poor visibility beyond their network's border. In this paper we leverage the diversity of RIPE Atlas traceroute measurements to solve the classic problem of monitoring in-network delays and get credible delay change estimations to monitor network conditions in the wild. We demonstrate a set of complementary methods to detect network disruptions and report them in near real time. The first method detects delay changes for intermediate links in traceroutes. Second, a packet forwarding model predicts traffic paths and identifies faulty routers and links in cases of packet loss. In addition, we define an alarm score that aggregates changes into a single value per AS in order to easily monitor its sanity, reducing the effect of uninteresting alarms. Using only existing public data we monitor hundreds of thousands of link delays while adding no burden to the network. We present three cases demonstrating that the proposed methods detect real disruptions and provide valuable insights, as well as surprising findings, on the location and impact of the identified events.

    References

    [1]
    2015. Follow-up on previous incident at AMS-IX platform. https://ams-ix.net/newsitems/195. (May 2015).
    [2]
    2015. Telekom Malaysia: Internet services disruption. https://www.tm.com.my/OnlineHelp/Announcement/Pages/internet-services-disruption-12-June-2015.aspx. (June 2015).
    [3]
    2017. CAIDA, The IPv4 Routed /24 Topology Dataset. https://www.caida.org/data/active/ipv4_routed_24_topology_dataset.xml. (2017).
    [4]
    2017. Internet Health Report. http://ihr.iijlab.net. (2017).
    [5]
    2017. Internet Health Report API. http://ihr.iijlab.net/ihr/api/. (2017).
    [6]
    2017. Internet Health Report source code. https://github.com/romain-fontugne/tartiflette. (2017).
    [7]
    2017. RIPE NCC, Atlas. https://atlas.ripe.net. (2017).
    [8]
    Emile Aben. 2013. Hurricane Sandy as seen by RIPE Atlas. NANOG 57 (February 2013).
    [9]
    Emile Aben. 2015. Does the Internet Route Around Damage? A Case Study Using RIPE Atlas. https://labs.ripe.net/Members/emileaben/does-the-internet-route-around-damage. (November 2015).
    [10]
    Jay Aikat, Jasleen Kaur, F Donelson Smith, and Kevin Jeffay. 2003. Variability in TCP round-trip times. In Proceedings of IMC'03. ACM, 279--284.
    [11]
    Kostas G Anagnostakis, Michael Greenwald, and Raphael S Ryger. 2003. cing: Measuring network-internal delays using only existing infrastructure. In INFOCOM 2003, Vol. 3. IEEE, 2112--2121.
    [12]
    Brice Augustin, Xavier Cuvellier, Benjamin Orgogozo, Fabien Viger, Timur Friedman, Matthieu Latapy, Clémence Magnien, and Renata Teixeira. 2006. Avoiding traceroute anomalies with Paris traceroute. In IMC. ACM, 153--158.
    [13]
    Ritwik Banerjee, Abbas Razaghpanah, Luis Chiang, Akassh Mishra, Vyas Sekar, Yejin Choi, and Phillipa Gill. 2015. Internet Outages, the Eyewitness Accounts: Analysis of the Outages Mailing List. In Passive and Active Measurement. Springer, 206--219.
    [14]
    Tian Bu, Nick Duffield, Francesco Lo Presti, and Don Towsley. 2002. Network tomography on general topologies. In ACM SIGMETRICS Performance Evaluation Review, Vol. 30. ACM, 21--30.
    [15]
    Balakrishnan Chandrasekaran, Georgios Smaragdakis, Arthur Berger, Matthew Luckie, and Keung-Chi Ng. 2015. A Server-to-Server View of the Internet. In CoNEXT. ACM.
    [16]
    Mark Coates, Alfred Hero, Robert Nowak, and Bin Yu. 2002. Internet tomography. IEEE Signal processing magazine 19, 3 (2002), 47--65.
    [17]
    Walter de Donato, Pietro Marchetta, and Antonio Pescapé. 2012. A hands-on look at active probing using the IP prespecified timestamp option. In Passive and Active Measurement. Springer, 189--199.
    [18]
    Wouter de Vries, José Jair Santanna, Anna Sperotto, and Aiko Pras. 2015. How asymmetric is the Internet? A study to support the use of traceroute. In Intelligent mechanisms for network configuration and security (LNCS), Vol. 9122. Springer, 113--125.
    [19]
    Leiwen Deng and Aleksandar Kuzmanovic. 2008. Monitoring persistently congested Internet links. In Network Protocols, 2008. ICNP 2008. IEEE International Conference on. IEEE, 167--176.
    [20]
    Sally Floyd and Van Jacobson. 1993. Random early detection gateways for congestion avoidance. Networking, IEEE/ACM Transactions on 1, 4 (1993), 397--413.
    [21]
    Romain Fontugne, Johan Mazel, and Kensuke Fukuda. 2015. An empirical mixture model for large-scale RTT measurements. In INFOCOM'15. IEEE, 2470--2478.
    [22]
    Jean Dickinson Gibbons and Subhabrata Chakraborti. 2011. Nonparametric statistical inference. (2011).
    [23]
    Yu Gu, Lee Breslau, Nick Duffield, and Subhabrata Sen. 2009. On passive one-way loss measurements using sampled flow statistics. In INFOCOM 2009, IEEE. IEEE, 2946--2950.
    [24]
    Geoff Huston. 2017. BGP in 2016. https://labs.ripe.net/Members/gih/bgp-in-2016. (2017).
    [25]
    Partha Kanuparthy, Danny H Lee, Warren Matthews, Constantine Dovrolis, and Sajjad Zarifzadeh. 2013. Pythia: Detection, Localization, and Diagnosis of Performance Problems. IEEE Communications Magazine (2013), 56.
    [26]
    Ethan Katz-Bassett, John P John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. 2006. Towards IP geolocation using delay and topology measurements. In Proceedings of IMC'06. ACM, 71--84.
    [27]
    Ethan Katz-Bassett, Harsha V Madhyastha, Vijay Kumar Adhikari, Colin Scott, Justine Sherry, Peter Van Wesep, Thomas E Anderson, and Arvind Krishnamurthy. 2010. Reverse traceroute. In NSDI, Vol. 10. 219--234.
    [28]
    Nick Kephart. 2015. Route Leak Causes Global Outage in Level 3 Network. https://blog.thousandeyes.com/route-leak-causes-global-outage-level-3-network/. (June 2015).
    [29]
    Ken Keys, Young Hyun, Matthew Luckie, and Kim Claffy. 2013. Internet-scale IPv4 alias resolution with MIDAR. IEEE/ACM Transactions on Networking (TON) 21, 2 (2013), 383--399.
    [30]
    Robert Kisteleki. 2015. The AMS-IX Outage as Seen with RIPE Atlas. https://labs.ripe.net/Members/kistel/the-ams-ix-outage-as-seen-with-ripe-atlas. (May 2015).
    [31]
    Matthew Luckie, Amogh Dhamdhere, David Clark, Bradley Huffaker, and kc claffy. 2014. Challenges in Inferring Internet Interdomain Congestion. In IMC. ACM, 15--22.
    [32]
    Matthew Luckie, Young Hyun, and Bradley Huffaker. 2008. Traceroute probe method and forward IP path inference. In IMC. ACM, 311--324.
    [33]
    Ratul Mahajan, Neil Spring, David Wetherall, and Thomas Anderson. 2003. User-level Internet path diagnosis. SOSP'03 37, 5 (2003), 106--119.
    [34]
    Vesna Manojlovic. 2016. Using RIPE Atlas and RIPEstat to detect network outage events. SANOG 27 (January 2016).
    [35]
    Pietro Marchetta, Alessio Botta, Ethan Katz-Bassett, and Antonio Pescapé. 2014. Dissecting round trip time on the slow path with a single packet. In PAM. Springer, 88--97.
    [36]
    Athina Markopoulou, Fouad Tobagi, and Mansour Karam. 2006. Loss and delay measurements of internet backbones. Computer Communications 29, 10 (2006), 1590--1604.
    [37]
    Abia Moloisane, Ivan Ganchev, and Máirtín OâĂŹDroma. 2014. Internet Tomography: An Introduction to Concepts, Techniques, Tools and Applications. Cambridge Scholars Publishing.
    [38]
    Robert G Newcombe. 1998. Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in medicine 17, 8 (1998), 857--872.
    [39]
    Ramakrishna Padmanabhan, Patrick Owen, Aaron Schulman, and Neil Spring. 2015. Timeouts: Beware Surprisingly High Delay. In Proceedings of the 2015 Internet Measurement Conference (IMC '15). ACM, New York, NY, USA, 303--316.
    [40]
    Cristel Pelsser, Luca Cittadini, Stefano Vissicchio, and Randy Bush. 2013. From Paris to Tokyo: on the suitability of ping to measure latency. In Proceedings of IMC'13. ACM, 427--432.
    [41]
    Lin Quan, John Heidemann, and Yuri Pradkin. 2013. Trinocular: Understanding Internet Reliability Through Adaptive Probing. In Proceedings of the ACM SIGCOMM Conference. ACM, Hong Kong, China, 255--266.
    [42]
    Michael Rabbat, Robert Nowak, and Mark Coates. 2004. Multiple source, multiple destination network tomography. In INFOCOM 2004. Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 3. IEEE, 1628--1639.
    [43]
    L. Rizo-Dominguez, D. Munoz-Rodriguez, C. Vargas-Rosales, D. Torres-Roman, and J. Ramirez-Pacheco. 2014. RTT Prediction in Heavy Tailed Networks. IEEE Communications Letters 18, 4 (April 2014), 700--703.
    [44]
    Root Server Operators. 2015. Events of 2015-11-30. http://www.root-servers.org/news/events-of-20151130.txt. (December 2015).
    [45]
    Matthew Roughan, Walter Willinger, Olaf Maennel, Debbie Perouli, and Randy Bush. 2011. 10 lessons from 10 years of measuring and modeling the internet's autonomous systems. Selected Areas in Communications, IEEE Journal on 29, 9 (2011), 1810--1821.
    [46]
    Nathaniel Schenker and Jane F Gentleman. 2001. On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician 55, 3 (2001), 182--186.
    [47]
    Yaron Schwartz, Yuval Shavitt, and Udi Weinsberg. 2010. On the diversity, stability and symmetry of end-to-end Internet routes. In INFOCOM IEEE Conference on Computer Communications Workshops, 2010. IEEE, 1--6.
    [48]
    A. Shah, R. Fontugne, E. Aben, C. Pelsser, and R. Bush. 2017. Disco: Fast, good, and cheap outage detection. In 2017 Network Traffic Measurement and Analysis Conference (TMMA). 1--9.
    [49]
    Ankit Singla, Balakrishnan Chandrasekaran, P Godfrey, and Bruce Maggs. 2014. The internet at the speed of light. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks. ACM, 1.
    [50]
    Joel Sommers, Paul Barford, Nick Duffield, and Amos Ron. 2005. Improving accuracy in end-to-end packet loss measurement. ACM SIGCOMM Computer Communication Review 35, 4 (2005), 157--168.
    [51]
    Renata Teixeira, Keith Marzullo, Stefan Savage, and Geoffrey M Voelker. 2003. In search of path diversity in ISP networks. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 313--318.
    [52]
    Andree Toonk. 2015. Massive route leak causes Internet slowdown. http://www.bgpmon.net/massive-route-leak-cause-internet-slowdown/. (June 2015).
    [53]
    Feng Wang, Zhuoqing Morley Mao, Jia Wang, Lixin Gao, and Randy Bush. 2006. A measurement study on the impact of routing events on end-to-end internet path performance. ACM SIGCOMM Computer Communication Review 36, 4 (2006), 375--386.
    [54]
    Matt Weinberg and Duane Wessels. 2016. Review and analysis of attack traffic against A-root and J-root on November 30 and December 1, 2015. OARC 24 (April 2016).
    [55]
    Rand R Wilcox. 2010. Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy. Springer Science & Business Media.
    [56]
    Edwin B Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.
    [57]
    Sajjad Zarifzadeh, Madhwaraj Gowdagere, and Constantine Dovrolis. 2012. Range tomography: combining the practicality of boolean tomography with the resolution of analog tomography. In Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, 385--398.
    [58]
    Bo Zhang, T.S.E. Ng, A. Nandi, Rudolf H. Riedi, P. Druschel, and Guohui Wang. 2010. Measurement-Based Analysis, Modeling, and Synthesis of the Internet Delay Space. IEEE/ACM Transactions on Networking 18, 1 (Feb 2010), 229--242.
    [59]
    Ming Zhang, Chi Zhang, Vivek S Pai, Larry L Peterson, and Randolph Y Wang. 2004. PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services. In OSDI, Vol. 4. 12--12.
    [60]
    Han Zheng, Eng Keong Lua, Marcelo Pias, and Timothy G Griffin. 2005. Internet routing policies and round-trip-times. In Passive and Active Network Measurement. Springer, 236--250.

    Cited By

    View all
    • (2024)Following the Data Trail: An Analysis of IXP DependenciesPassive and Active Measurement10.1007/978-3-031-56252-5_10(199-227)Online publication date: 11-Mar-2024
    • (2023)Longitudinal Analysis of Inter-City Network Delays2023 7th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA58422.2023.10198987(1-9)Online publication date: 26-Jun-2023
    • (2023)Poster: Towards a Publicly Available Framework to Process Traceroutes with MetaTraceProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3625001(728-729)Online publication date: 24-Oct-2023
    • Show More Cited By

    Index Terms

    1. Pinpointing delay and forwarding anomalies using large-scale traceroute measurements

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IMC '17: Proceedings of the 2017 Internet Measurement Conference
      November 2017
      509 pages
      ISBN:9781450351188
      DOI:10.1145/3131365
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • USENIX Assoc: USENIX Assoc

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 November 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. congestion
      2. internet delay
      3. outage
      4. routing anomaly
      5. statistical analysis
      6. traceroute

      Qualifiers

      • Research-article

      Conference

      IMC '17
      IMC '17: Internet Measurement Conference
      November 1 - 3, 2017
      London, United Kingdom

      Acceptance Rates

      Overall Acceptance Rate 277 of 1,083 submissions, 26%

      Upcoming Conference

      IMC '24
      ACM Internet Measurement Conference
      November 4 - 6, 2024
      Madrid , AA , Spain

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)64
      • Downloads (Last 6 weeks)4

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Following the Data Trail: An Analysis of IXP DependenciesPassive and Active Measurement10.1007/978-3-031-56252-5_10(199-227)Online publication date: 11-Mar-2024
      • (2023)Longitudinal Analysis of Inter-City Network Delays2023 7th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA58422.2023.10198987(1-9)Online publication date: 26-Jun-2023
      • (2023)Poster: Towards a Publicly Available Framework to Process Traceroutes with MetaTraceProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3625001(728-729)Online publication date: 24-Oct-2023
      • (2022)High-Speed Path Probing Method for Large-Scale NetworkSensors10.3390/s2215565022:15(5650)Online publication date: 28-Jul-2022
      • (2022)On the Modeling of RTT Time Series for Network Anomaly DetectionSecurity and Communication Networks10.1155/2022/54990802022Online publication date: 1-Jan-2022
      • (2022)KL-DectionWireless Communications & Mobile Computing10.1155/2022/50995082022Online publication date: 1-Jan-2022
      • (2022)Deadline-aware Multipath Transmission for Streaming BlocksIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796942(2178-2187)Online publication date: 2-May-2022
      • (2022)Escala: Timely Elastic Scaling of Control Channels in Network MeasurementIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796830(1848-1857)Online publication date: 2-May-2022
      • (2022)Jitterbug: A New Framework for Jitter-Based Congestion InferencePassive and Active Measurement10.1007/978-3-030-98785-5_7(155-179)Online publication date: 22-Mar-2022
      • (2021)WebRTC-based measurement tool for peer-to-peer applications and preliminary findings with real usersProceedings of the 16th Asian Internet Engineering Conference10.1145/3497777.3498544(1-8)Online publication date: 14-Dec-2021
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media