Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Report of the NSF Workshop on Research Challenges in Distributed Computer Systems Editors: M. Frans Kaashoek Carla Ellis Steve Gribble Jeff Mogul Barbara Liskov Anthony Joseph Ion Stoica David Andersen Hank Levy Amin Vahdat∗ Mike Dahlin Andrew Myers December 4, 2005 1 Executive Summary This report1 summarizes recommendations from a workshop on research challenges in distributed computer systems, sponsored by the National Science Foundation. A program committee solicited input from the research community by asking researchers to submit position papers that identified grand challenges in distributed systems, invited researchers based on their submissions, and selected a few position papers for presentation at the workshop. Most of the workshop was organized around break-out sessions, in which we refined the research challenges and identified the facilities needed to carry out future research as well as what the distributed systems community can contribute to the facility. Information about the workshop, including the full program, the selected submissions, and the slides of the presentations and the break-out sessions, is available at http://www.pdos.lcs.mit.edu/˜kaashoek/nsf/. The workshop attendees identified a number of challenge applications whose implementation will require research advances in the design and engineering of distributed systems. Examples include: managing a large number of personal devices and data, improving the auto commute through data dissemination and using sensors and actuators in the car to avoid accidents, rapidly deploying fault-tolerant distributed systems to assist in disaster recovery, and understanding and affecting the planet in real-time. Each of these applications has security, storage, fault-tolerance, and usability requirements that can be addressed only if there are new research advances. To spur these advances, the community has a need for a facility to support experimentation. Facilities such as Planetlab [2] and Emulab [3] have demonstrated that the right facility can spark research progress. The new applications, however, require a scale of facility that is unavailable at present, and that includes product versions of recent research advances. Such a facility would allow researchers to leverage the recent results in tackling the next set of challenges. The report makes the following recommendations for the research community and NSF. For the research community: • Use the challenge applications to frame important and challenging research questions in distributed systems. The answers are likely to generate knowledge that goes well beyond the current understanding of distributed systems. • Participate in the development of a shared facility to experiment with solutions. This development can leverage the recent advances in overlay networks, virtualization, secure global access, resource allocation, and debugging. • Collaborate with the other communities, such as the networking, sensornet, and security communities, on both collaborative research and on a network infrastructure that is suitable for designing and engineering the challenge applications, exploiting the unique opportunity to rethink the traditional layering from top to bottom. ∗ This committee was greatly assisted by the contributions of all workshop participants, listed in Appendix A. The references in this report don’t follow the scientific standards for research publications. A few references have been included to allow the reader to easily follow up on some specific points, but the references are not comprehensive. 1 1 For National Science Foundation the report makes the following recommendations: • Sponsor the research identified in this report in upcoming solicitations. In particular, research in security, storage systems, simplifying management, and reliability is likely to lead to the creation of important new knowledge about the design and implementation of distributed systems needed to build the challenge applications. • Sponsor a shared facility to carry out the research. This facility must include significant storage, has points of presence around the world, has a diverse set of links, and good connectivity to the existing Internet. The facility must also incorporate simulation, emulation, and traces of large-scale distributed systems, allowing researchers to explore isolated questions in controlled ways. 2 New Application Domains This section identifies distributed applications that go well beyond what is currently possible with the existing knowledge base in distributed systems. The goal of identifying these applications is to frame research problems that can result in novel distributed systems that can support the specific applications, as well as, and perhaps more importantly, unanticipated applications that pose similar challenges. These applications identified in this report have the potential to improve society and can capture the imagination of researchers. They cover the range of important emerging distributed application domains including networks of sensors and actuators, networked embedded devices, pervasive and ubiquitous computing, disruption-tolerant networks, mobile systems with location-aware applications, real-time acquisition of data, and management of vast amounts of information. Additionally, the example applications have requirements beyond the usual performance requirements, including security, privacy, usability, permanence, availability, flexibility, adaptability, programmability of large numbers of devices, self-monitoring, and self-management. We identified the following applications: • Digital life. Daily life in the near future will surround us with digital devices mediating many activities at work and at home, often invisibly affecting our environment and connecting with each other and with the global Internet. While the digitally enhanced environment can offer significant benefits, the complexity of managing large numbers of interacting embedded devices and the potential threats to security and privacy must not become burdens and risks to the user. Special instances of enhanced digital living applications include health monitoring and the management of our personal data collections: – Health monitoring. Technology can give people more independence and mobility within the community even when they have a serious health problem. The ability to monitor the health of at-risk patients beyond just the home environment and to notify doctors or family of medically-relevant events (ranging from ones requiring immediate intervention to ones contributing to diagnosis) while protecting privacy and security of such information can allow a person to more safely pursue a normal lifestyle. Such an application involves sensors, possibly actuators, location-awareness, disruptiontolerant wireless communications interfacing to the Internet, and secure medical data. – Managing vast personal data. Individuals can generate large amounts of data, including digital photographs, music, and documents. They acquire even more by sharing such data with others. How does anyone locate that special photo of a family reunion several years ago among thousands stored with camera-generated filenames? Users need tools to manage their individual data, to control how sharing is done, to ensure preservation of precious family records over generations, and to organize and retrieve desired data easily. Data about individuals are also captured by others during business transactions as well as casual activities. How can people understand and have some control over what data are being disseminated and stored about them? 2 • Managing the auto commute. The daily commute to and from work is one of society’s great aggravations and inefficiencies. Each driver makes local decisions based on the information available at that car. Networked data dissemination can significantly improve the timeliness and scope of that information. Sensor and actuators in the automobile can unburden the driver and help avoid accidents. Overall traffic flow could be more effectively modeled, analyzed in real-time, and managed. The vision is of a safer, more energy efficient, and less annoying commute in spite of increasing traffic loads. • Disaster recovery. Communication breakdowns and loss of vital information in a disaster scenario are critical problems that can prevent an effective response. During a disaster, it is essential to rapidly deploy a replacement communications network and keep it running as long as necessary. The affected area may lack basic infrastructure and services such as power, transportation, and security, and the network must be robust to those conditions in order to support life-critical communication services. Lost data storage must be restorable from geographically distributed replicas. Reducing the complexity of setting up the portable network infrastructure and providing the redundancy that allows data recovery depends on effective selfmanagement of these systems. • Understanding and affecting the planet in real-time. The ultimate challenge is to understand the interdependent dynamic systems of our planet—its climate, politics, populations, ecology, economy, etc. Instrumentation can provide real-time data and large-scale computation can combine these distributed data streams to extract global meaning to, for example, assist in energy management. This visionary global-scale application can drive innovation in resource management, security of information, and network design. What is missing in our knowledge of distributed systems that is needed in order to deliver the functions of these applications? We identified the following high-level challenges: • 106 plug-and-play devices. If we are to be surrounded by interconnected embedded devices in our digital lives, users can’t be expected to explicitly configure and maintain all of these devices. We will need the ability to automate and provide self-management of such devices. • Balancing security, usability, and flexibility. It is clear there are tradeoffs among these goals, but the nature and impact of the tradeoffs are not understood well by system builders or by users. • Policy expression. Our capability to articulate a policy in areas like security or privacy is limited. • Integrating evolution into our distributed systems. Our systems need to be resilient and adaptable to change in order to provide the unprecedented longevity of continuous service demanded by future applications. Change can take many forms: software upgrades to long-lived services, obsolescence of data formats, changing user behavior patterns and expectations, the need to retarget real-time data acquisition in an already deployed sensor network, and other unanticipated changes. It may be necessary that systems must run with a mixed of old and new hardware and software, because older parts may be difficult to upgrade. • Five nines availability. Since some of our example applications deal with life-and-death situations, the infrastructure and distributed systems addressing them must provide levels of availability and disruptiontolerance that they currently cannot achieve. • Vast amounts of personal storage. The increasing importance of digital personal storage raises research issues of tracking data capture, providing permanence of data storage, and enabling easier data retrieval with a heterogeneous and evolving collection of devices, systems, services, formats, and networks. • Projection of the physical world into the digital world. Linking the digital and physical worlds as required by several of our target applications requires techniques for representing and naming physical objects. 3 • Finding information—to search everything. Searching data that scale in volume and evolve with time with a usable interface is a difficult problem. When one adds the mapping of objects in the physical world into the digital world, one can envision generalizing the search problem to finding anything - including one’s car keys. Finally, how does one ”find” a specific person to be contacted instead of explicitly choosing among the multitude of communication devices and networks that might be associated with that user? 3 Research problems This section categorizes the challenges identified in Section 2 by research area and expands on them by identifying specific research challenges. 3.1 Security Challenges Everyone knows that current distributed systems offer too little security and endanger privacy. Yet these systems are increasingly important to almost every activity, and a serious, widespread failure of security could be catastrophic. Furthermore, as reliance on the Internet grows, and networked computing becomes more and more pervasive in our lives, the situation will only get worse. New research is needed to make the global distributed computing infrastructure secure. The techniques that result from this work can enable important applications that are otherwise infeasible. For example, integrated medical information systems would allow better medical care, but they must protect patient privacy. The traffic control system mentioned in Section 2 similarly requires protection of privacy and also anonymity, since otherwise a citizen’s movements could be tracked. New research is needed on all important security problems, e.g., to prevent phishing attacks, identity theft, worms, and denial of service attacks. Here are some that in particular require attention: • Making networks secure. The design of the Internet creates intrinsic vulnerabilities to malicious hosts. For example, attacks such as flash worms and distributed denial of service are made easier because any host on the Internet can construct an IP address and command the routing infrastructure to direct packets to that location. Other key services such as the Domain Name Service are similarly open to abuse. If hosts may be malicious, the network must enforce some security requirements from below. The basic assumptions of the current network architecture need to be reconsidered; for example, the use of routing and name resolution can perhaps be mediated by access controls while preserving functionality. • Confidence in the environment. In an increasingly mobile, changing world in which computation is done using small, pervasive computing devices in wireless networked environments, there is an intrinsic question of whether the device or the environment in which the computation runs can be trusted. This issue arises, for example, whenever a user uses a shared (e.g., public) computer to access sensitive data, because the shared computer may be compromised to leak data it accesses. New methods for obtaining trust in a computing device and its environment are needed. For example, a trusted platform module based on attestation may offer a better and more flexible way to obtain trust in the environment. • Usability. Many security vulnerabilities arise because users do not understand the implications of their decisions, whether these involve choices in configuring their local machine, or simply the choice to click on a URL. New approaches are needed that simplify and automate security management, particularly in large, complex distributed systems. One promising research area is new approaches for exploring and visualizing security policies: “wizards” that can help improve the decisions that users must make about what policies to use and what to trust. Another approach is a drastic reduction in the amount of configuration users must perform, since many attacks exploit configuration mistakes. • Incentive-based security. Complete prevention of bad behavior may be infeasible. A promising alternative approach is to use incentives that reward good behavior and punish bad behavior. If participants run a significant risk of punishment for improper actions, rational participants will have an incentive to behave 4 better. In general, if participants have some investment in using a system, it is possible to give them incentives to behave by the rules. For example, eBay manages to deter bad behavior because participants are invested in their reputation; bad behavior hurts their reputation and makes further transactions difficult. • Auditing. To recognize security breaches, to be able to punish intruders in court, and also to support incentive-based security, we need to track what users do and store this information in an audit trail. The tracking mechanism should be implementation-independent, so that different applications can share it; thus we avoid duplication of work. One major issue with such a mechanism is scale: there is a great deal of behavior to keep track of, and fine-grained information is needed to capture what is really going on. To improve scalability, new techniques for abstracting and compressing system state and history are needed, because it would be prohibitively expensive to keep a complete record of all system behavior (e.g., including all network packets). A second major issues is how to monitor behavior while protecting privacy. Clearly, audit logs could be mined to violate user privacy, yet accuracy is required for the logs to be useful. For example, suppose a user is negligent in defending his computing resources. To recognize this requires recording enough information about the evolution of the state of a computer so that a vulnerable, poorly maintained computer can be identified, yet this must be done without violating the privacy of every computer user. Thus new methods that balance privacy and and accuracy are needed. • Software support. Systems today are implemented at a relatively low level of abstraction that supports the classic host-based view of a distributed system, in which each host is in charge of just its own security. As the computing infrastructure evolves toward pervasive, embedded communicating devices sharing information, this programming model is increasingly out of sync with the way that distributed systems are constructed. Higher-level RPC-based interfaces, such as Java’s Remote Method Invocation, hide some of the low-level mechanics, but tend to hurt rather than help with assessment of system trustworthiness: they remove needed control from the programmer and obscure security-critical details. A higher-level programming model is needed that will make construction of trustworthy distributed systems easier in a “post-host” world. • Security validation. Building trustworthy distributed systems is challenging. These systems must be robust in the face of misuse or attacks, remaining available while preventing damage to critical information and leakage of confidential information. Individual techniques exist that help to enforce these integrity, availability, confidentiality, and consistency properties. But it remains difficult to build a high-assurance distributed system using these techniques in combination. Work on validation of security and fault tolerance has typically examined particular kinds of computing systems, with respect to a limited set of system-level properties. More complete techniques are needed for validating all aspects of system security in distributed systems. 3.2 Storage Data storage is a fundamental challenge for large-scale distributed systems, and advances in storage research promise to enable a range of new high-impact applications and capabilities. Illustrative examples include: • Data sharing for agile organizations. Hundreds or thousands of workers from several relief organizations securely share highly-available emergency response data among themselves and with victims after a regional disaster. Employees from several organizations collaborate on a joint project using a secure, shared data workspace. • Self-managing storage. A large-scale storage system reliably and securely stores highly-available data for an enterprise but the only maintenance it requires is that new racks of disks are added and old ones removed as storage technology advances over the course of years. 5 • Personal storage. An extended family shares selected photos and videos with each other and with friends securely and over decades, with no information lost to disk failures, worm/virus attack, or careless operation of the system. • Real-time data streams. Researchers search for, access, and analyze real-time data streams from network sensors. • Perfect memory. Non-expert users are able to recall any word they have read, written, heard, or spoken at a relevant moment when that information is useful. • Access-anywhere storage. A user with dozens of data-access devices (phone, camera, PDA, laptop, car, music player, ...) is able to access any of her data from any device at any time and from any location. • Medical data. Medical researchers analyze anonymized data for millions of patients across hundreds of electronic medical record storage systems in order to retrospectively analyze the effects of different treatments or to prospectively identify outbreaks of disease. Although these examples only scratch the surface of the opportunities enabled by ubiquitous access to everincreasing amounts of increasingly valuable data, nevertheless a set of cross-cutting research issues arises from them: • Long-term durability. We lack understanding of how to architect storage systems that reliably store data for decades and we lack the ability to model or predict the reliability of a storage service or architecture over such timescales in the face of media failures, operator errors, malicious attack, business failures of storage service providers, and format evolution. • Retrieval and interfaces. The venerable “files and folders” interface for organizing storage appears not to be the right model for a world with many more sources of non-traditional data, enormous amounts of data, non-expert users, and non-traditional applications. • Auditability and provenance. New applications and regulatory requirements make it increasingly important to be able to track the original sources of data, who modified data, who read data, and what actions depend on what data. • Managability. Increasing volumes of data, increasing numbers of devices, and visibility of digital data to non-expert users make it essential to vastly reduce management cost. In particular, techniques requiring effort that is linear (or worse) in the size of storage or the number of storage devices will not scale. • Security and sharing. Increasing diversity of applications and users make it increasingly important and difficult to share data conveniently and securely. • Fundamental trade-offs. Fundamental trade-offs of consistency v. availability v. performance demand new design trade-offs for new applications, new hardware capabilities, and new workloads. • Meeting high-level goals. Given any set of high level goals for a system’s durability, consistency, performance, resource constraints, and availability, a system should automatically adjust its replication strategy to meet its goals. 3.3 Simplifying Management The advance of cheap, high-performance processors has made general-purpose CPUs ubiquitous in our everyday lives: from control and audio-visual devices in our homes, to the communications devices we carry in our pockets, to the tools we use at work, to the large-scale clusters that drive the global-scale services we rely on. Unfortunately, the computer-based automation of our lives has created a major source of stress for the majority 6 of the population. While technology has the potential to simplify our lives, it also frustrates us because of the complexity of the devices we use and the common problems we face in using them. Overall, computing and embedded devices are often prone to failure, and unintuitive and difficult to use and manage. As the number of devices grows, the difficulty of coordinating their interactions grows as well. Here we briefly examine two environments, the home and the enterprise, and look at a set of research challenges whose solution could greatly simplify management in both environments. To begin with, homes are becoming increasingly complex, to the point where managing technology is a major challenge for most homeowners. Most homes already have a large (and expanding) set of heterogeneous hardware devices, such as desktop and laptop computers, audio/visual and entertainment devices, and control devices (e.g., lighting, heating, security, etc.). These devices include a heterogeneous collection of software systems, including commodity operating systems, “hidden” operating systems (e.g., inside an Xbox or TIVO), and embedded control software (e.g., inside a thermostat). Furthermore, there will be a growing set of services provided to the home over the Internet, including storage and backup, monitoring, and entertainment. For the homeowner, this environment with its large set of software and hardware components can be a nightmare. For example, homes must be protected from malicious threats on the Internet. How does a naı̈ve user know how to set firewall policy for their firewall or NAT box? Or, how do parents control content for their children, across the complete set of computers and other devices? Overall, users will want to know exactly what content is entering and leaving the house, and this is difficult, given the variety of communications connecting a modern home (e.g., phone, wifi, cable, dsl, cellular, etc.). Finally, once everything is configured and working, how does a homeowner reconstruct the environment should he or she move to a new house? The enterprise environment can be exceedingly complex, even for the professional IT managers who are tasked with controlling it. A modern enterprise includes large numbers of computers, software systems, devices, and possibly sensors. There are typically different versions or generations of hardware systems, software systems, and network infrastructure (routers, firewalls, etc.). To an increasing extent, a modern enterprise uses off-the-shelf applications for running their business; different applications are purchased from different vendors, yet these applications need to interoperate successfully, both within the machine room and across the Internet. As well, a company may have sharing relationships with other enterprises, allowing those enterprises access to some but not all of its data. Finally, employees will typically need access to internal data and processes through mobile devices deployed outside of the company’s firewall. For the IT manager, the enterprise computing environment is a constant challenge. Setting policies and ensuring that they meet business rules is hard, and it is often difficult to maintain consistent policies across a widely distributed organization. Software is difficult to manage: installation, configuration, update, and retirement of software executing on a large body of machines is tedious and error-prone. Access control critical for ensuring privacy and security is often difficult to specify and implement in a heterogeneous environment. While there are many software packages that focus on problem reporting, diagnosing problems (as opposed to just noticing that something is wrong) is left to human experts. Testing within the enterprise is difficult, due to the scale of deployment; it may be necessary to reserve hundreds or even thousands of machines just for test purposes in some environments. The research goal of this effort is to greatly simplify management of technology across the various domains. For example, a user at home should be able to purchase a new set of devices, plug them in, and have them automatically configure themselves, not just individually, but for operation with all the other devices already in the home. In the enterprise, an IT manager should be able to easily create policies and verify that those policies meet requirements. In both domains, software should be easy to install, control, update, and remove. There are a number of research tasks whose completion will result in simplifying the management of these environments, including the following: 1. Creation of tools for easily specifying and verifying policies for security, privacy, and information control. 2. Creation of automated tools that drive configurations from those policies. 7 3. Creation of better tools for object access control in complex heterogeneous environments. 4. Better automated component (software) installation and life cycle management, to simplify the processes of updating software and removing outdated software. 5. Creation of mechanisms for global visibility, control, and coordination of components (both hardware and software) within an environment. 6. Better protocols for devices to learn about and communicate with each other, so that they can interact dynamically in the environment. 7. Facilities to create virtual communities, allowing individuals or organizations to communicate and collaborate flexibly. 8. Creation of tools that allow users to assess what the result of some action would be if they took that action; e.g., “Tell me if something I’m going to do will work,” or “Tell me the infrastructure cost of a policy change I’m about to make.” 9. Automated instruction, monitoring, and fault and event analysis systems, including correlation of events. 10. Time travel facilities that cross multiple hardware/software platforms, allowing backup to previous states. 3.4 24x7 Network services have become critical to national and international infrastructure. However, these services haven’t achieved the same number of 9’s as other infrastructure services. For example, the Internet achieves between 2 and 3 nines of availability, while the telephone network achieves between 4 and 5. Worse, as the complexity and importance of network services grows, it will be a challenge to maintain the current level of availability. To provide continuous operation in such a complex environment requires an holistic approach. Delivering continuous operation is not about any individual computer, component, or system operator, but rather about the service as a whole. Failures are inevitable; ensuring continuous service availability in the face of failure is the challenge. To address this challenge research is needed in the following areas: Failure Models: Most work on fault-tolerance has assumed that failures are either failstop or Byzantine. An important question is whether there are additional failure models in which the failed nodes exhibit behavior in between these two extremes. For example, some recent work has considered failures in which nodes are merely “selfish”: they misbehave only when there is some rational advantage to be gained. Intermediate failure modes are interesting if they (1) correspond to reality and (2) allow the use of more efficient fault-tolerance techniques (e.g., replication algorithms that require fewer replicas or fewer rounds in their protocols). Avoiding Correlated Failures: Work on fault-tolerance also assumes that failures are uncorrelated, but this may not be true in reality. A particular problem is failures due to deterministic errors in software, which can cause all replicas in a group of machines to fail simultaneously. N-version programming, in which each replica runs a different version of the software, is generally an impractical approach because the versions must be truly independent; this condition is met only for a few services (such as file systems), and even in these cases, relatively few versions exist. An alternative is to try to avoid deterministic errors by manipulating the code image so that, for example, a buffer overrun won’t cause all versions of the code to fail. The approaches that have been investigated so far, however, have limited applicability; developing new approaches with wider applicability would be very useful. 8 Living with Failure: In spite of our best efforts we must assume that failures will happen. One approach to coping with this problem is to attempt to mask failures by recognizing them quickly and then causing the failed node to recover very fast. Some success along these lines has been achieved in the work on micro-reboots; more work in this area would be very useful. Failure Detection: In general we would like systems to monitor themselves so that failed nodes can be automatically removed from service. It is easy for automatic monitoring to detect nodes that have failed in a failstop manner: just probe them long enough to rule out the possibility that the lack of response is due to communication problems. However it is unclear how to detect Byzantine-faulty nodes, since they are capable of responding to probes in the appropriate manner even though they are failed. Techniques that allow such failures to be detected are needed. A related problem is determining whether a response from a single server is based on your request being processed by the code you intended to use. Some progress along these lines has been made by the work on software attestation, but better techniques are needed. Self Configuration: We would like systems that do not rely on human operators to reconfigure, both because this is a well-known source of errors, and because with a human in the loop, reconfiguration will be slow. Furthermore, given failure detection, a system can know when nodes need to be removed from service, and their tasks distributed to other nodes. However, there are many issues to work out to achieve a practical selfconfiguring system. Software Upgrades: In a long-lived system, we must expect that the software will need to be upgraded to improve performance or provide new features. Techniques for allowing software upgrades to be installed automatically are needed. One issue is that the upgrade may need to take effect gradually, and there may be long periods of time when different nodes in the system are running different, possibly incompatible, software versions; techniques are needed to allow the system to continue to provide service under these conditions. Another issue is that the upgrade may contain incorrect code, and as a result it may be necessary to roll it back to the previous version. Techniques to allow such roll backs are also needed; this is a hard problem because the system must not lose data that came into existence prior to the roll back. Better Tools: In spite of programmers’ best efforts, errors exist in their software systems. Tools that can help avoid such errors can greatly improve the reliability of systems. The tools might take the form of a platform on which to build distributed services; the platform would provide a rich functionality, thus reducing the work for the system developers, but must avoid hiding features that developers need. Or the tools might encorporate new ways of finding errors, either during the software development process, or after the fact, while the system (possibly consisting of legacy code) is running. Availability versus Consistency: Delivering continuous operation in the face of failure typically requires replication of system state. Replication introduces known tradeoffs between data consistency and availability. On the one hand, a system could strive for consistency, waiting to update a sufficient number of data replicas for writes or to contact a sufficient number of replicas for reads before returning successfully. On the other hand, the system could return without waiting to communicate with a sufficient number of replicas. Experience with replicated systems suggests that perfect consistency results in unacceptable availability while striving for the highest level of availability results in unacceptable consistency. While there have been various efforts to quantify and control the tradeoffs in this space, none has exported a practical programming model to system developers. 4 Infrastructure for research To evaluate what facilities are needed for carrying out the identified research, we considered the challenge applications discussed in Section 2, as well as the research challenges discussed in Section 3. We have identified two distinct kinds of facilities needed to support our research. First is a distributed facility that can be used to evaluate new distributed systems in a realistic environment. Second is a local facility 9 that allows experimentation with new systems under carefully controlled conditions. The first kind of facility is exemplified by Planetlab today, while the second is exemplified by Emulab. An important difference between the two environments is that many systems can be running on Planetlab nodes simultaneously, whereas in Emulab a researcher has complete use of a subset of machines to run his or her experiment. In either case, the facilities require both physical hardware resources and the software artifacts to allow users of the facility to manage and access it effectively. The rest of this section discusses the required facilities in more detail. We answer three questions: (1) what does our community need from a facility? (2) how can our community contribute to realizing such a facility?, and (3) what services can our community provide on this facility (that will be useful not only to us but to other communities as well)? 4.1 What does the community need? To address the challenge applications, the community needs a facility that includes significant storage, has points of presence around the world, has a diverse set of links, and good connectivity to the existing Internet. The facility must also incorporate simulation, emulation, and traces of large-scale distributed systems. A thread through all the challenge applications is how to handle large amounts of data, and thus the facility must support experimentation with large amount of data. To experiment with real-world failures and to be able to handle them, the facility should have multiple points of presence across the world and a diverse set of network links. To be able to attract Internet users so that the community can observe real workloads, the facility must have good access to the Internet. A good target for the shared facility is: • 20-30 sites across world • a cluster of at least 256 machines per site • at least 256 TB per site In addition, the community could use a sizeable number of small clusters with heterogeneous connectivity, and the facility must have sufficient resources to support the services outlined in Section 4.3; in particular, logging networking events may require a large amount of storage and tools for processing the logged information may require a large amount of computation. Many applications require, in addition to “common” resources, a set of application-specific resources and the artifacts to make those resources useful. Identifying and creating a subset of these application-specific facilities will be necessary to allow users of the facility to create and deploy many of the challenge applications. These application-specific facilities include: • Vehicular networks • Environment / habitat monitoring • Home networks and consumer devices • Human telemetry data monitoring • “Digital life” facilities (personal data production) A important requirement for the facility is evolvability. Facility users should be able to easily add and remove their resources to and from the facility. For instance, a user of the facility should be able to add her own cluster to the facility for the duration of her experiments or share her cluster with a subset of other facility users. Furthermore, users should be able to integrate into the facility entire networks such as sensor networks, wireless networks, and community networks. Evolvability would allow the facility to grow organically, as the facility users need changes or new applications emerge. 10 In addition to a real-world, distributed facility, the community also needs support for simulation and emulation. Emulation needs to be done in an environment in which the experimenter can control all aspects of the experiment; this requires a separate facility from the real-world facility, since sharing of resources doesn’t allow the kind of control that is needed. Simulation is necessary to experiment with “larger” things than what can be done for real, e.g., huge numbers of users, very large periods of simulated time. The recently proposed GENI facility [1] is a good starting point for fulfilling the support requirements for the challenge applications and the research issues identified in this report. Because of its cross-disciplinary nature, the GENI facility provides a unique opportunity for the distributed systems community to influence future network designs and to enhance the facility by contributing recent advances in distributed systems. 4.2 How can the community contribute? To be really useful, the facility needs to be accompanied by software. In this section we discuss the software that is needed to make the new facilities usable. The distributed systems community has expertise in providing this kind of software. Virtual system network. The design of any shared facility has to balance between (1) efficient resource sharing, and (2) providing predictable performance. To achieve efficient resource sharing, one needs to virtualize all facility resources including bandwidth, CPU, memory, and storage. On the other hand, achieving predictable performance would require one to provide the user with the abstraction of a dedicated facility. With such an abstraction, a user of the facility can specify a virtual network topology by associating a desired bandwidth and delay with each virtual link, and specify the computation and storage resources at every node. To implement the required virtualization, we need to address two challenges. First, given the specification of a system network, we need to allocate the appropriate resources in the facility to implement it. This problem is an instance of the constrained distributed resource allocation problem, which is NP-complete. While several heuristics have been proposed to solve this problem, we may need to develop new ones to take advantage of the particular instance of our problem. Second, we need to enforce resource allocation on each shared resource. One way to achieve this would be to use virtual machines in conjunction with proportional share allocation schedulers for the CPU, and weighted fair queuing algorithms for allocating the bandwidth along virtual links. Fair resource allocation. Another important problem that needs to be addressed in the context of a shared facility is how to allocate resources among the users of the facility when the facility is oversubscribed. Existing facilities use simple policies that partition the facility either in time or space. For instance, with PlanetLab, each facility user gets a slice on each machine in the facility, and each machine divides its resources proportionally among the competing slices. As a result, the performance seen by a user degrades as the number of active users of the facility increases. In contrast, with Emulab, the resources are statically partitioned in space, with each facility user having full control of a set of machines. This allows Emulab to provide highly predictable performance to its users, but at the cost of long waiting times during the oversubscription periods. These limitations, which will be only magnified as use of the facilities increases, have prompted researchers to look elsewhere for more flexible resource allocation models. One approach that has gained traction recently is using a market-based approach, where each user of the facility receives some virtual money, which can be used to “buy” computation, communication, and storage resources. Furthermore, if no more resources are available, facility users may be able to trade resources among themselves. The distributed system community has a long tradition in developing market and economic-based resource allocation schemes. This experience puts our community in an excellent position to develop the resource allocation policies and techniques for the shared facility. Support for auditing and debugging. A major challenge in managing a large scale facility is to identify misconfigurations, and also potential attacks on the facility or attacks initiated by machines in the facility. A related challenge is debugging distributed applications, a notoriously difficult and time consuming task. Recording detailed information about the execution of each application would go a long way to address these challenges. Indeed, imagine that each node logs all incoming and outgoing packets and performs periodic virtual machine 11 checkpoints. Such information would be very useful in debugging applications and in identifying attacks, because of the ability to determine communication patterns and/or packet content. Gathering such detailed information is a daunting task, as it requires not only a huge amount of storage, but also the ability to efficiently search the data. There are many trade-offs that one could make in designing such a logging system to simplify the problem. For example, one could log only the packet headers, or checkpoint less frequently the virtual machines that are inactive (indeed, there is little need to checkpoint a virtual machine that is idle). While we believe that a basic logging system that always logs but at a predefined granularity should always be turned on to allow auditing, it is unlikely that the level of detail provided by such logging will satisfy all facility users. For instance, some facility users may want to log the content of the packets (if the basic logging system doesn’t), or log other events such as timeouts and interrupts. To support this need, we have to develop a logging framework that is extensible and customizeable. Since developing such framework requires expertise in a variety of fields including operating systems, storage, and networks, the distributed system community is in a unique position to address this challenge. Simulators. The distributed systems community is good at building high-performance single-machine simulators using virtual machine technology. This line of research needs to be extended to building simulators for complete distributed systems, with realistic fault models. The simulators must allow very large scale simulation, since one reason for simulation is to determine whether a design and implementation meets its goals at scales larger than you can actually test, over longer time frames, with loads, faults, and changes that aren’t normally seen. The community is also good at collecting data and making it available in a way that is suitable for use in simulators. The community can provide software to collect and store topologies, fault loads, and change loads, with increasing realism (more detail) and evolving as the world changes. 4.3 What services can the community provide? This section discusses services that would be deployed to run on the facility. We have identified two categories of such services that our community can develop and deploy on the facility: basic services, and generic services. To support these services, additional resources may be necessary, in particular if some of the services become popular over time. 4.3.1 Basic services Basic services are the services necessary to carry out research experiments on the facility. One example of a basic service is an improved Domain Name Server (DNS) that provides improved resilience and allows fast updates. Another example of a basic service might be a discovery system to locate lightly-loaded and/or nearby resources. Here we discuss a few other example, in more detail: A Distributed Authentication Infrastructure. A Public Key Infrastructure (PKI) and a Certificate Authority would allow strong identities for the facility users. Authentication is required for both the network facility itself, to grant access to applications and services and provide a basis for resource isolation, but also for applications and users. A flexible and accessible public-key or other authentication service, along with the software and resources to manage it, will bootstrap both the network itself, and the development of applications on top of it. This service must include the development of libraries to allow a variety of applications to use the service and the development of guidelines for how and when applications should use the service. Tools for developing distributed systems. Robust and easy to develop applications and services are the fundamental building block of a successful network. Application and and service developers—including researchers— will benefit from the development and availability of tools to help build distributed systems. These tools include language and compiler support, program and protocol analysis tools, and methods for program verification and validation; such tools are discussed in Section 3.4. 12 Global data access. A single experiment will often involve many nodes and shared data. The basic services should make it easy for programs running on different machines to access and share data in a locationindependent manner. Example services in this category include distributed file systems, distributed hash tables, and facilities for flexible logging and data collection. 4.3.2 Generic services Researchers often deploy their relatively mature systems on the shared facility for extended periods of time in order to gain understanding of their behavior under realistic loads. It may be appropriate to allow certain of these systems to run on the facility indefinitely. This is appropriate for services that are useful to large user communities, including non-computer scientists. Examples of systems that might be allowed to run indefinitely are: Citeseer, SourceForge, a conference submission service, Fastlane, spam filters, distributed firewalls, and an open search engine. We must be careful in deciding to allow mature systems to run on the facility indefinitely, since the more of these systems are supported on the facility, the less available the facility becomes for its original use of experimenting with new research systems. 5 Summary This report has 4 major conclusions: • Research in the challenge applications, or in other applications of a similar nature, is likely to generate new knowledge in the engineering of distributed systems. • Effective implementation for these distributed applications requires research in many areas, including most aspects of security; the engineering of massive, searchable storage systems; the engineering of systems that operate 24x7 in the face of a wide range of failures; and the engineering of systems that are easy to use and manage. • A shared facility would be a enabler of this kind of research. • Recent research results in distributed systems in the areas of virtualization, resource allocation, auditing, simulators, and transparent and secure data access should be included in the shared facility since they can significantly enhance its usability. References [1] National Science Foundation. The GENI initiative. http:/http://www.nsf.gov/cise/geni/. [2] L. Peterson, T. Anderson, D. Culler, and T. Roscoe. A blueprint for introducing disruptive technology into the I nternet. In Proc. of the 1st HotNets Workshop, Oct. 2002. http://www.planet-lab.org. [3] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. In Proc. of the Fifth Symposium on Operating Systems Design and Implementation, pages 255–270, Boston, MA, Dec. 2002. USENIX Association. http://www.emulab.org. 13 Appendix A: Participants Andersen, David Anderson, Tom Arpaci-Dussean, Remzi Castro, Miguel Chase, Jeff Comer, Doug Dahlin, Mike Druschel, Peter Ellis, Carla Fleisch, Brett Gribble, Steve Jannotti, John Joseph, Anthony Kotz, David Lepreau, Jay Levy, Hank Mogul, Jeff Myers, Andrew Pai, Vivek Parulkar, Guru Peterson, Larry Reiter, Mike Roscoe, Mothy Roussopolous, Mema Satya Schwan, Karsten Shah, Mehul Shenker, Scott Shrira, Liuba Stoica, Ion Terry, Doug Touch, Joe Vahdat, Amin Verissimo, Paulo Vogels, Werner Wallach, Debby Weihl, Bill Welsh, Matt CMU U. Washington Wisconsin Microsoft Duke Purdue U. Texas Max Planck Duke NSF U. Washington Brown Berkeley Dartmouth Utah U. Washington HP Cornell Princeton NSF Princeton CMU Intel Harvard CMU Georgia Tech HP Berkeley Brandeis Berkeley Microsoft ISI UCSD Universidade de Lisboa Amazon Google Harvard dga+@cs.cmu.edu tom@cs.washington.edu remzi@cs.wisc.edu mcastro@microsoft.com chase@cs.duke.edu comer@cs.purdue.edu dahlin@cs.utexas.edu druschel@cs.rice.edu carla@cs.duke.edu bfleisch@nsf.gov gribble@cs.washington.edu jj@cs.brown.edu adj@cs.berkeley.edu dfk@cs.dartmouth.edu lepreau@cs.utah.edu levy@cs.washington.edu Jeff.Mogul@hp.com andru@cs.cornell.edu vivek@CS.Princeton.EDU gparulka@nsf.gov peterson@cs.princeton.edu reiter@cs.cmu.edu timothy.roscoe@intel.com mema@eecs.harvard.edu satya@cs.cmu.edu schwan@cc.gatech.edu mehul.shah@hp.com shenker@cs.berkeley.edu liuba@vilnius.lcs.mit.edu istoica@cs.berkeley.edu terry@microsoft.com touch@ISI.EDU vahdat@cs.ucsd.edu pjv@di.fc.ul.pt werner@amazon.com kerr@google.com bill@weihl.com mdw@eecs.harvard.edu 14