Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Autopilot: automatic data center management

Published: 01 April 2007 Publication History

Abstract

Microsoft is rapidly increasing the number of large-scale web services that it operates. Services such as Windows Live Search and Windows Live Mail operate from data centers that contain tens or hundreds of thousands of computers, and it is essential that these data centers function reliably with minimal human intervention. This paper describes the first version of Autopilot, the automatic data center management infrastructure developed within Microsoft over the last few years. Autopilot is responsible for automating software provisioning and deployment; system monitoring; and carrying out repair actions to deal with faulty software and hardware. A key assumption underlying Autopilot is that the services built on it must be designed to be manageable. We also therefore outline the best practices adopted by applications that run on Autopilot.

References

[1]
Ajmani, S., Liskov, B. and Shrira, L. Modular Software Upgrades for Distributed Systems. 20th European Conference on Object-Oriented Programming, July 2006, 452--476.
[2]
Barroso, L. A., Dean, J. and Holzle, U. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, 2003.
[3]
Brown, A. and D. A. Patterson. Embracing Failure: A Case for Recovery-Oriented Computing (ROC). High Performance Transaction Processing Symposium, October 2001.
[4]
Candea, G. and Fox, A. Crash-Only Software. 9th Workshop on Hot Topics in Operating Systems, May 2003, 67--72.
[5]
Gentzsch, W., Iwano, K., Johnston-Watt, D. Minhas, M. A. and Yousif, M. Self-adaptable autonomic computing systems: an industry view. 16th International Workshop on Database and Expert Systems Applications, August 2005, 201--205.
[6]
Lamport, L. The Part-Time Parliament. ACM Transactions on Computer Systems 16, 2 (May 1998), 133--169.
[7]
Microsoft Active Directory for Windows Server 2003. http://www.microsoft.com/windowsserver2003/technologies/directory/activedirectory/default.mspx

Cited By

View all
  • (2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 3-Feb-2025
  • (2024)RD-Probe: Scalable Monitoring With Sufficient Coverage In Complex Datacenter NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672256(258-273)Online publication date: 4-Aug-2024
  • (2024)A Heterogeneous Streaming Vehicle Data Access Model for Diverse IoT Sensor Monitoring Network ManagementIEEE Internet of Things Journal10.1109/JIOT.2024.338449311:16(26929-26943)Online publication date: 15-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 41, Issue 2
Systems work at Microsoft Research
April 2007
93 pages
ISSN:0163-5980
DOI:10.1145/1243418
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2007
Published in SIGOPS Volume 41, Issue 2

Check for updates

Author Tags

  1. automatic management
  2. cluster computing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 3-Feb-2025
  • (2024)RD-Probe: Scalable Monitoring With Sufficient Coverage In Complex Datacenter NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672256(258-273)Online publication date: 4-Aug-2024
  • (2024)A Heterogeneous Streaming Vehicle Data Access Model for Diverse IoT Sensor Monitoring Network ManagementIEEE Internet of Things Journal10.1109/JIOT.2024.338449311:16(26929-26943)Online publication date: 15-Aug-2024
  • (2024)The time for revolutionizing small modular reactors: Cost reduction strategies from innovations in operation and maintenanceProgress in Nuclear Energy10.1016/j.pnucene.2024.105288174(105288)Online publication date: Sep-2024
  • (2024)Autonomous Driving Test System under Hybrid Reality: The Role of Digital Twin TechnologyInternet of Things10.1016/j.iot.2024.101301(101301)Online publication date: Jul-2024
  • (2023)DITWireless Communications & Mobile Computing10.1155/2023/72094142023Online publication date: 1-Jan-2023
  • (2023)Spacelord: Private and Secure Smart Space SharingDigital Threats: Research and Practice10.1145/36378795:2(1-27)Online publication date: 19-Dec-2023
  • (2023)An Efficient Approach for Resilience and Reliability Against Cascading Failure2023 15th International Conference on Developments in eSystems Engineering (DeSE)10.1109/DeSE58274.2023.10100283(71-76)Online publication date: 9-Jan-2023
  • (2022)Spacelord: Private and Secure Smart Space SharingProceedings of the 38th Annual Computer Security Applications Conference10.1145/3564625.3564637(427-439)Online publication date: 5-Dec-2022
  • (2022)An inter-cell resource usage analysis of large-scale datacentre trace logs2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC56403.2022.00054(305-312)Online publication date: Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media