Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

End-to-end WAN service availability

Published: 01 April 2003 Publication History

Abstract

This paper seeks to understand how network failures affect the availability of service delivery across wide-area networks (WANs) and to evaluate classes of techniques for improving end-to-end service availability. Using several large-scale connectivity traces, we develop a model of network unavailability that includes key parameters such as failure location and failure duration. We then use trace-based simulation to evaluate several classes of techniques for coping with network unavailability. We find that caching alone is seldom effective at insulating services from failures but that the combination of mobile extension code and prefetching can improve average unavailability by as much as an order of magnitude for classes of service whose semantics support disconnected operation. We find that routing-based techniques may provide significant improvements but that the improvements of many individual techniques are limited because they do not address all significant categories of network failures. By combining the techniques we examine, some systems may be able to reduce average unavailability by as much as one or two orders of magnitude.

References

[1]
{1} D. Andersen, H. Balakrishnan, M. Kaashoek, and R. Morris, "Resilient overlay networks," in Proc. 18th ACM Symp. Operating Systems Principles, 2001, pp. 131-145.
[2]
{2} "Internet Content Adaptation Protocol (ICAP) DS-2326," Network Appliance, Sunnyvale, CA, 2000.
[3]
{3} S. Bhattacharjee, M. H. Ammar, E. W. Zegura, N. Shah, and Z. Fei, "Application layer anycasting," in Proc. IEEE INFOCOM, 1997, pp. 1388-1396.
[4]
{4} P. Cao, J. Zhang, and K. Beach, "Active cache: Caching dynamic contents on the Web," in Proc. Middleware, 1998, pp. 373-388.
[5]
{5} B. Chandra, M. Dahlin, L. Gao, A. Khoja, A. Nayate, A. Razzaq, and A. Sewani, "Resource management for scalable disconnected access to Web services," in Proc. 10th Int. World Wide Web Conf., May 2001, pp. 245-256.
[6]
{6} B. Chandra, M. Dahlin, L. Gao, and A. Nayate, "End-to-end WAN service availability," in Proc. 3rd USENIX Symp. Internet Technologies and Systems, 2001, pp. 97-108.
[7]
{7} C. Cunha, A. Bestavros, and M. Crovella, "Characteristics of WWW traces," Dept. Computer Science, Boston Univ., Boston, MA, Tech. Rep. TR-95-010, 1995.
[8]
{8} R. D'Agostino and M. Stephens, Eds., Goodness-of-Fit Techniques . New York: Marcel Dekker, 1986.
[9]
{9} M. Dahlin, B. Chandra, L. Gao, A. Khoja, A. Nayate, A. Razzaq, and A. Sewani, "Using mobile extensions to support disconnected services," Dept. Computer Sciences, Univ. Texas at Austin, Tech. Rep. TR-2000-20, 2000.
[10]
{10} M. Dahlin, B. Chandra, L. Gao, and A. Nayate, "End-to-end WAN service availability (extended version)," Dept. Computer Sciences, Univ. Texas at Austin, Tech. Rep. UTCS-02-50, 2002.
[11]
{11} D. Duchamp, "Prefetching hyperlinks," in Proc. USENIX Symp. Internet Technologies and Systems, Oct. 1999, pp. 127-138.
[12]
{12} B. Duska, D. Marwood, and M. Feeley, "The measured access characteristics of World-Wide-Web client proxy caches," in Proc. USENIX Symp. Internet Technologies and Systems, Dec. 1997, pp. 25-36.
[13]
{13} Z. Fei, S. Bhattacharjee, E. Zegura, and M. Ammar, "A novel server selection technique for improving the response time of a replicated service,"in Proc. IEEE INFOCOM, Mar. 1998, pp. 783-791.
[14]
{14} J. Gwertzman and M. Seltzer, "The case for geographical pushcaching," in Proc. HOTOS'95, May 1995, pp. 51-55.
[15]
{15} M. Harchol-Balter, "The effect of heavy-tailed job size distributions on computer system design," in Proc. ASA-IMS Conf. Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, June 1999.
[16]
{16} J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, 2nd ed. San Mateo, CA: Morgan Kaufmann, 1996.
[17]
{17} J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West, "Scale and performance in a distributed file system," ACM Trans. Comput. Syst., vol. 6, no. 1, pp. 51-81, Feb. 1988.
[18]
{18} R. Jain, The Art of Computer Systems Performance Analysis. New York: Wiley, 1991, ch. 13, pp. 179-200.
[19]
{19} A. Joseph, A. de Lespinasse, J. Tauber, D. Gifford, and M. Kaashoek, "Rover: A toolkit for mobile information access," in Proc. ACM Symp. Operating Systems Principles, Dec. 1995, pp. 156-171.
[20]
{20} J. Kistler and M. Satyanarayanan, "Disconnected operation in the coda file system," ACM Trans. Comput. Syst., vol. 10, no. 1, pp. 3-25, Feb. 1992.
[21]
{21} M. Korupolu and M. Dahlin, "Coordinated placement and replacement for large-scale distributed caches," in Proc. 1999 IEEE Workshop Internet Applications, June 1999, pp. 62-71.
[22]
{22} K. R. Krishnan, R. Doverspike, and C. Pack, "Improved survivability with multilayer dynamic routing," IEEE Commun. Mag., vol. 33, pp. 62-68, July 1995.
[23]
{23} T. Kroeger, D. Long, and J. Mogul, "Exploring the bounds of Web latency reduction from caching and prefetching," in Proc. USENIX Symp. Internet Technologies and Systems, Dec. 1997, pp. 13-22.
[24]
{24} C. Labovitz, A. Ahuja, and F. Jahanian, "Experimental study of internet stability and backbone failures," in Proc. FTCS'99, June 1999, pp. 278-285.
[25]
{25} D. Li and D. Chariton, "Scalable Web caching of frequently updated objects using reliable multicast," in Proc. USENIX Symp. Internet Technologies and Systems, Oct. 1999, pp. 1-12.
[26]
{26} A. Moissis, "SYBASE Replication Server: A Practical architechture for distributing and sharing corporate information," SYBASE Inc., Tech. Rep., 1994.
[27]
{27} A. Myers, P. Dinda, and H. Zhang, "Performance characteristics of mirror servers on the internet," in Proc. IEEE INFOCOM, 1999, pp. 304-312.
[28]
{28} Dataplot Reference Manual. NIST Handbook Number 148, National Institute of Standards and Technology, Gaithersburg, MD, 2001.
[29]
{29} B. Noble, M. Satyanarayanan, D. Narayanan, J. Tilton, J. Flinn, and K. Walker, "Agile application-aware adaptation for mobility," in Proc. ACM Symp. Operating Systems Principles, Oct. 1997, pp. 276-287.
[30]
{30} V. Padmanabhan and J. Mogul, "Using predictive prefetching to improve World Wide Web latency," in Proc. ACM SIGCOMM, July 1996, pp. 22-36.
[31]
{31} V. Paxson, "Measurements and analysis of end-to-end internet dynamics," Ph.D. dissertation, University of California, Berkeley, 1997.
[32]
{32} J. Pitkow and P. Pirolli, "Mining longest repeating subsequences to predict World Wide Web surfing," in Proc. USENIX Symp. Internet Technologies and Systems, Oct. 1999, pp. 139-150.
[33]
{33} M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki, E. Siegel, and D. Steere, "Coda: A highly available file system for a distributed workstation environment," IEEE Trans. Comput., vol. 39, pp. 447-459, Apr. 1990.
[34]
{34} S. Savage, A. Collins, E. Hoffman, J. Snell, and T. Anderson, "The end-to-end effects of internet path selection," in Proc. ACM SIGCOMM, Sept. 1999, pp. 289-299.
[35]
{35} G. Snedecor and W. Cochran, Statistical Methods, 7th ed. Ames, IA: Iowa State Univ. Press, 1980.
[36]
{36} D. Terry, M. Theimer, K. Petersen, A. Demers, M. Spreitzer, and C. Hauser, "Managing update conflicts in bayou, a weakly connected replicated storage system," in Proc. ACM Symp. Operating Systems Principles, Dec. 1995, pp. 172-183.
[37]
{37} R. Tewari, M. Dahlin, H. Vin, and J. Kay, "Design considerations for distributed caching on the Internet," in Proc. Int. Conf. Distributed Computing Systems, May 1999, pp. 273-284.
[38]
{38} G. Tomlinson, H. Orman, M. Condry, J. Kempf, and D. Farber, "Extensible proxy services framework," The Internet Engineering Task Force, IETF-Draft-Tomlinson-Epsfw-00.txt, July 2000.
[39]
{39} K. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications, 2nd ed. New York: Wiley, 2002.
[40]
{40} A. Vahdat, M. Dahlin, T. Anderson, and A. Aggarwal, "Active naming: Flexible location and transport of wide-area resources," in Proc. USENIX Symp. Internet Technologies and Systems, Oct. 1999, pp. 151-164.
[41]
{41} A. Venkataramani, R. Kokku, and M. Dahlin, "TCP-nice: A mechanism for background transfers," in Proc. OSDI'02, Dec. 2002, pp. 329-343.
[42]
{42} A. Venkataramani, P. Yalagandula, R. Kokku, S. Sharif, and M. Dahlin, "The potential costs and benefits of long-term prefetching for content distribution," Comput. Commun. J., vol. 25, no. 4, pp. 367-375, 2002.
[43]
{43} D. Wessels. (1998) Squid Internet Object Cache. National Laboratory for Applied Network Research. {Online}. Available: http://squid.nlanr.net/Squid/
[44]
{44} R. Wolff, "Poisson arrivals see time averages," Oper. Res., vol. 30, no. 2, pp. 223-231, 1982.
[45]
{45} A. Wolman, G. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray,D. Pinnel, A. Karlin, and H. Levy, "Organization-based analysis of Web-object sharing and caching," in Proc. USENIX Symp. Internet Technologies and Systems, Oct. 1999, pp. 25-36.
[46]
{46} A. Wolman, G. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. Levy, "On the scale and performance of cooperative Web proxy caching," in Proc. ACM Symp. Operating Systems Principles, Dec. 1999, pp. 16-31.
[47]
{47} C. Yoshikawa, B. Chun, P. Eastham, A. Vahdat, T. Anderson, and D. Culler, "Using smart clients to build scalable services," in Proc. USENIX Technical Conf., Jan. 1997, pp. 105-118.
[48]
{48} Y. Zhang, V. Paxson, and S. Shenkar, "The stationarity of internet path properties: Routing, loss, and throughput," AT&T Center for Internet Research, International Computer Science Institute (ICSI), Berkeley, CA, Tech. Rep., 2000.

Cited By

View all
  • (2021)Potential of multipath communications to improve communications reliability for internet-based cyberphysical systems2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA.2016.7733536(1-8)Online publication date: 11-Mar-2021
  • (2017)Measuring and Improving the Reliability of Wide-Area Cloud PathsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052560(253-262)Online publication date: 3-Apr-2017
  • (2016)JITeRComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2016.05.010104:C(122-136)Online publication date: 20-Jul-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Networking
IEEE/ACM Transactions on Networking  Volume 11, Issue 2
April 2003
159 pages

Publisher

IEEE Press

Publication History

Published: 01 April 2003
Published in TON Volume 11, Issue 2

Author Tags

  1. availability
  2. disconnected operation
  3. failure model
  4. internet
  5. overlay routing
  6. replication
  7. world-wide web

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Potential of multipath communications to improve communications reliability for internet-based cyberphysical systems2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA.2016.7733536(1-8)Online publication date: 11-Mar-2021
  • (2017)Measuring and Improving the Reliability of Wide-Area Cloud PathsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052560(253-262)Online publication date: 3-Apr-2017
  • (2016)JITeRComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2016.05.010104:C(122-136)Online publication date: 20-Jul-2016
  • (2015)Skewly replicating hot data to construct a power-efficient storage clusterJournal of Network and Computer Applications10.1016/j.jnca.2014.06.00550:C(168-179)Online publication date: 1-Apr-2015
  • (2013)DepSkyACM Transactions on Storage10.1145/25359299:4(1-33)Online publication date: 1-Nov-2013
  • (2013)Survey On reliability in publish/subscribe servicesComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2012.10.02357:5(1318-1343)Online publication date: 1-Apr-2013
  • (2011)Differentiated Availability in Cloud Computing SLAsProceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing10.1109/Grid.2011.25(129-136)Online publication date: 21-Sep-2011
  • (2011)Prediction models for long-term Internet prefix availabilityComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2010.10.00555:3(873-889)Online publication date: 1-Feb-2011
  • (2010)WebprofilerProceedings of the 2nd international conference on COMmunication systems and NETworks10.5555/1831443.1831474(288-298)Online publication date: 5-Jan-2010
  • (2010)iFlowProceedings of the VLDB Endowment10.14778/1920841.19210373:1-2(1557-1560)Online publication date: 1-Sep-2010
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media