SANE: A Protection Architecture For Enterprise Networks
SANE: A Protection Architecture For Enterprise Networks
SANE: A Protection Architecture For Enterprise Networks
Abstract
Connectivity in todays enterprise networks is regulated
by a combination of complex routing and bridging policies, along with various interdiction mechanisms such as
ACLs, packet filters, and other middleboxes that attempt
to retrofit access control onto an otherwise permissive
network architecture. This leads to enterprise networks
that are inflexible, fragile, and difficult to manage.
To address these limitations, we offer SANE, a protection architecture for enterprise networks. SANE defines a single protection layer that governs all connectivity within the enterprise. All routing and access control decisions are made by a logically-centralized server
that grants access to services by handing out capabilities
(encrypted source routes) according to declarative access
control policies (e.g., Alice can access http server foo).
Capabilities are enforced at each switch, which are simple and only minimally trusted. SANE offers strong attack resistance and containment in the face of compromise, yet is practical for everyday use. Our prototype implementation shows that SANE could be deployed in current networks with only a few modifications, and it can
easily scale to networks of tens of thousands of nodes.
Introduction
including router ACLs, firewalls, NATs, and other middleboxes, along with complex link-layer technologies
such as VLANs.
Despite years of experience and experimentation,
these mechanisms are far from ideal. They require a
significant amount of configuration and oversight [43],
are often limited in the range of policies they can enforce [45], and produce networks that are complex [49]
and brittle [50]. Moreover, even with these techniques,
security within the enterprise remains notoriously poor.
Worms routinely cause significant losses in productivity [9] and potential for data loss [29, 34]. Attacks resulting in theft of intellectual property and other sensitive
information are similarly common [19].
The long and largely unsuccessful struggle to protect
enterprise networks convinced us to start over with a
clean slate, with security as a fundamental design goal.
The result is our Secure Architecture for the Networked
Enterprise (SANE). The central design goals for our architecture are as follows:
Allow natural policies that are simple yet powerful. We seek an architecture that supports natural
policies that are independent of the topology and
the equipment used, e.g., Allow everyone in group
sales to connect to the http server hosting documentation. This is in contrast to policies today that are
typically expressed in terms of topology-dependent
ACLs in firewalls. Through high-level policies, our
goal is to provide access control that is restrictive
(i.e., provides least privilege access to resources),
yet flexible, so the network does not become unusable.
Enforcement should be at the link layer, to prevent
lower layers from undermining it. In contrast, it is
common in todays networks for network-layer access controls (e.g., ACLs in firewalls) to be undermined by more permissive connectivity at the link
layer (e.g., Ethernet and VLANs).
Have only one trusted component. Todays networks trust multiple components, such as firewalls,
switches, routers, DNS, and authentication services
(e.g., Kerberos, AD, and Radius). The compromise
of any one component can wreak havoc on the entire enterprise. Our goal is to rely on a central (yet
potentially replicated) trusted entity where all policy is centrally defined and executed.
SANE achieves these goals by providing a single protection layer that resides between the Ethernet and IP
layer, similar to the place that VLANs occupy. All connectivity is granted by handing out capabilities. A capability is an encrypted source route between any two
communicating end points.
Source routes are constructed by a logicallycentralized Domain Controller (DC) with a complete
view of the network topology. By granting access using
a global vantage point, the DC can implement policies
in a topology-independent manner. This is in contrast to
todays networks: the rules in firewalls and other middleboxes have implicit dependencies on topology, which
become more complex as the network and policies grow
(e.g. VLAN tagging and firewall rules) [14, 47].
By default, hosts can only route to the DC. Users must
first authenticate themselves with the DC before they can
request a capability to access services and end hosts. Access control policies are specified in terms of services
and principals, e.g., users in group martins-friends can
access martins streaming-audio server.
At first glance, our approach may seem draconian: All
communication requires the permission of a central administrator. In practice, the administrator is free to implement a wide variety of policies that vary from strict
to relaxed and differ among users and services. The key
here is that SANE allows the easy implementation and
enforcement of a simply expressed rule.
Our approach might also seem dependent on a single point-of-failure (the DC) and not able to route traffic
around failures (because of static source routes). However, as we will argue, we can use standard replication
techniques, such as multiple DCs and redundant source
routes, to make the network reliable and quick to recover
from failures.
Complexity of Mechanism. A typical enterprise network today uses several mechanisms simultaneously to
protect its network: VLANs, ACLs, firewalls, NATs, and
so on. The security policy is distributed among the boxes
that implement these mechanisms, making it difficult to
correctly implement an enterprise-wide security policy.
Configuration is complex (for example, routing protocols often require thousands of lines of policy configuration [50]), making the security fragile. Furthermore, the
configuration is often dependent on network topology,
and is based on addresses and physical ports, rather than
on authenticated end-points. When the topology changes
or hosts move, the configuration frequently breaks, requires careful repair [50], and possibly undermines its
security policies.
A common response is to put all security policy in one
box and at a choke-point in the network, for example, in
a firewall at the networks entry and exit point. If an attacker makes it through the firewall, they have unfettered
access to the whole network.
Another way to address this complexity is to enforce
protection on the end host via distributed firewalls [14].
While reasonable, this has the down-side of placing all
trust in the end hosts. End host firewalls can be disabled
or bypassed, leaving the network unprotected, and they
offer no containment of malicious infrastructure, e.g., a
compromised NIDS [8].
Our new architecture allows simple high-level policies
to be expressed centrally. Policies are enforced by a single fine-grain mechanism within the network.
2
2.2
2.1
Threat Environment
System Architecture
SANE ensures that network security policies are enforced during all end host communication at the link
layer, as shown in Figure 1. This section describes two
versions of the SANE architecture. First, we present a
clean-slate approach, in which every network component
is modified to support SANE. Later, we describe a version of SANE that can inter-operate with unmodified end
hosts running standard IP stacks.
3.1
Domain Controller
Ethernet
SANE header
IP header
data
.
users and hosts, advertising services that are available,
and deciding who can connect to these services. It allows hosts to communicate by handing out capabilities
(encrypted source routes). As we will see in Section 3.5,
because the network depends on it, the DC will typically
be physically replicated (described in Section 3.5).
The DC performs three main functions:
2. Network Service Directory (NSD): The NSD replaces DNS. When a principal wants access to a
service, it first looks up the service in the NSD (services are published by servers using a unique name).
The NSD checks for permissionsit maintains an
access control list (ACL) for each serviceand then
returns a capability. The ACL is declared in terms
of system principals (users, groups), mimicking the
controls in a file system.
3.2
DC
Request Capability
FORWARD
Cap-ID
Cap-Exp
REVOKE
Cap-ID
Cap-Exp
Authenticator
Capability
Payload
Payload
SignatureDC
The NSD maintains a hierarchy of directories and services; each directory and service has an access control
list specifying which users or groups can view, access,
and publish services, as well as who can modify the
ACLs. This design is similar to that deployed in distributed file systems such as AFS [25].
As an example usage scenario, suppose martin
wants to share his MP3s with his friends aditya,
mike, and tal in the high performance networking group. He sets up a streaming audio server
on his machine bongo, which has a directory
stanford.hpn.martin.friends with ACLs
already set to allow his friends to list and acquire services. He publishes his service by adding the command
are used for immediate neighbor discovery and thus are never
forwarded. DC packets are used by end hosts and switches to
communicate with the DC; they are forwarded by switches to
the DC along a default route. FORWARD packets are used for
most host-to-host data transmissions; they include an encrypted
source route (capability) which tells switches where to forward
the packet. Finally, REVOKE packets revoke a capability before
its normal expiration; they are forwarded back along a capabilitys forward route.
3.3
Payload
HELLO
Protection Layer
1. Initialize:
CAPABILITY
EKservername (client-name, client-ID,
server-ID, last-hop)
3.4
Interoperability
3. Finalize:
CAPABILITY EKclientname (client-name, client-ID, firsthop, CAPABILITY), IV
3.5
3.6
Additional Features
Fault Tolerance
Replicating the Domain Controller. The DC is logically centralized, but most likely physically replicated so
as to be scalable and fault tolerant. Switches connect
to multiple DCs through multiple spanning trees, one
rooted at each DC. To do this, switches authenticate and
send their neighbor lists to each DC separately. Topology consistency between DCs is not required as each DC
grants routes independently. Hosts randomly choose a
DC to send requests so as to distribute load.
Network level-policy, user declared access policy and
the service directory must maintain consistency among
multiple DCs. If the DCs all belong to the same
enterpriseand hence trust each otherservice advertisements and access control policy can be replicated between DCs using existing methods for ensuring distributed consistency. (We will consider the case where
Mobility. Client mobility within the LAN is transparent to servers, because the service is unaware of (and so
independent of) the underlying topology. When a client
7
4.1
Resource Exhaustion
Attack Resistance
throughas a link failure, and it will try using a different capability instead. While well-behaved senders may
have to use or request alternate capabilities, their performance degradation is only temporary, provided that there
exists sufficient link redundancy to route around misbehaving switches. Therefore, using this approach, SANE
networks can quickly converge to a state where attackers
hold no valid capabilities and cannot obtain new ones.
4.2
4.3
Tolerating a Malicious DC
5.2
The DC consists of four separate modules: the authentication service, the network service directory, and the
topology and capability construction service in the Protection Layer Controller. For authentication purposes,
the DC was preconfigured with the public keys of all
switches.
Implementation
5.1
Domain Controller
Capability construction. For end-to-end path calculations when constructing capabilities, we use a bidirectional search from both the source and destination. All
computed routes are cached at the DC to speed up subsequent capability requests for the same pair of end hosts,
although cached routes are checked against the current
topology to ensure freshness before re-use.
Capabilities use 8-bit IDs to denote the incoming and
outgoing switch ports. Switch IDs are 32 bits and the
service IDs are 16 bits. The innermost layer of the capability requires 24 bytes, while each additional layer uses
14 bytes. The longest path on our test topologies was 10
switches in length, resulting in a 164 byte header.
Service Directory. DNS queries for all unauthenticated
users on our network resolve to the DCs IP address,
which hosts a simple webserver. We provide a basic
HTTP interface to the service directory. Through a web
browser, users can log in via a simple web-form and can
then browse the service directory or, with the appropriate
permissions, perform other operations (such as adding
and deleting services).
The directory service also provides an interface for
managing users and groups. Non-administrative users
are able to create their own groups and use them in
access-control declarations.
To access a service, a client browses the directory
tree for the desired service, each of which is listed as
a link. If a service is selected, the directory server
checks the users permissions. If successful, the DC generates capabilities for the flows and sends them to the
client (or more accurately, the clients SANE IP proxy).
The web-server returns an HTTP redirect to the services appropriate protocol and network address, e.g.,
ssh://192.168.1.1:22/. The clients browser
can then launch the appropriate application if one such
To support unmodified end hosts on our prototype network, we developed proxy elements which are positioned between hosts and the first hop switches. Our
proxies use ARP cache poisoning to redirect all traffic from the end hosts. Capabilities for each flow are
cached at the corresponding proxies, which insert them
into packets from the end host and remove them from
packets to the end host.
Our switch implementation supports automatic neighbor discovery, MST construction, link-state updates and
packet forwarding. Switches exchange HELLO messages every 15 seconds with their neighbors. Whenever
switches detects network failures, they reconfigure their
MST and update the DCs network map.
10
5.3
Example Operation
6.1
Microbenchmarks
6.2
Scalability
One potential concern with SANEs design is the centralization of function at the Domain Controller. As we discuss in Section 3.5, the DC can easily be physically replicated. Here, we seek to understand the extent to which
replication would be necessary for a medium-sized enterprise environment, basing on conclusions on traffic traces
collected at the Lawrence Berkeley National Laboratory
(LBL) [36].
The traces were collected over a 34-hour period in
January 2005, and cover about 8,000 internal addresses.
The traces anonymization techniques [37] ensure that
(1) there is an isomorphic mapping between hosts real
IP addresses and the published anonymized addresses,
and (2) real port numbers are preserved, allowing us to
identify the application-level protocols of many packets.
The trace contains almost 47 million packets, which includes 20,849 DNS requests and 145,577 TCP connections.
Figure 6 demonstrates the DNS request rate, TCP connection establishment rate, and the maximum number of
concurrent TCP connections per second, respectively.
The DNS and TCP request rates provide an estimate
for an expected rate of DC requests by end hosts in a
SANE network. The DNS rate provides a lower-bound
that takes client-side caching into effect, akin to SANE
end hosts multiplexing multiple flows using a single capability, while the TCP rate provides an upper bound.
Evaluation
DC
switch
5 hops
100,000 cap/s
762 Mb/s
10 hops
40,000 cap/s
480 Mb/s
15 hops
20,000 cap/s
250 Mb/s
Figure 6: DNS requests, TCP connection establishment requests, and maximum concurrent TCP connections per
second, respectively, for the LBL enterprise network.
Even for this upper bound, we found that the peak rate
was fewer than 200 requests per second, which is 200
times lower than what our unoptimized DC implementation can handle (see Table 1).
Related Work
Next, we look at what might happen upon a link failure, whereby all end hosts communicating over the failed
link simultaneously contact the DC to establish a new
capability. To understand this, we calculated the maximum concurrent number of TCP connections in the LBL
network.9 We find that the dataset has a maximum of
1,111 concurrent connections, while the median is only
27 connections. Assuming the worst-case link failure
whereby all connections traverse the same network link
which failsour simple DC can still manage 40 times
more requests.
Based on the above measurements, we estimate the
bandwidth consumption of control traffic on a SANE network. In the worst case, assuming no link failure, 200
requests per second are sent to the DC. We assume all
flows are long-lived, and that refreshes are sent every 10
minutes. With 1,111 concurrent connections in the worst
case, capability refresh requests result in at most an additional 2 packets/s.10 Given header sizes in our prototype implementation and assuming the longest path on
the network to be 10 hops, packets carrying the forward
and return capabilities will be at most 0.4 KB in size,
resulting in a maximum of 0.646 Mb/s of control traffic.
This analysis of an enterprise network demonstrates
that only a few domain controllers are necessary to handle DC requests from tens of thousands of end hosts. In
fact, DC replication is probably more relevant to ensure
uninterrupted service in the face of potential DC failures.
12
Weaver et al. [45] argue that existing configurations of coarse-grain network perimeters (e.g., NIDS
and multiple firewalls) and end host protective mechanisms (e.g. anti-virus software) are ineffective against
worms, both when employed individually or in combination. They advocate augmenting traditional coarsegrain perimeters with fine-grain protection mechanisms
throughout the network, especially to detect and halt
worm propagation.
Finally, commercial offerings from Consentry [3] introduce special-purpose bridges for enforcing access
control policy. To our knowledge, these solutions require that the bridges be placed at a choke point in the
network so that all traffic needing enforcement passes
through them. In contrast, SANE permission checking is
done at a central point only on connection setup, decoupling it from the data path. SANEs design both allows
redundancy in the network without undermining network
security policy and simplifies the forwarding elements.
Dealing with Routing Complexity. Often misconfigured routers make firewalls simply irrelevant by routing
around them. The inability to reason about connectivity
in complex enterprise networks has fueled commercial
offerings such as those of Lumeta [5], to help administrators discover what connectivity exists in their network.
In their 4D architecture, Rexford et al. [41, 24] argue that the decentralized routing policy, access control,
and management has resulted in complex routers and
cumbersome, difficult-to-manage networks. Similar to
SANE, they argue that routing (the control plane) should
be separated from forwarding, resulting a very simple
data path. Although 4D centralizes routing policy decisions, they retain the security model of todays networks.
Routing (forwarding tables) and access controls (filtering
rules) are still decoupled, disseminated to forwarding elements, and operate the basis of weakly-bound end-point
identifiers (IP addresses). In our work, there is no need
to disseminate forwarding tables or filters, as forwarding
decisions are made a priori and encoded in source routes.
Predicate routing [43] attempts to unify security and
routing by defining connectivity as a set of declarative
statements from which routing tables and filters are generated. SANE differs, however, in that users are firstclass objectsas opposed to end-point IDs or IP addresses in Predicate routingand thus can be used in
defining access controls.
Conclusion
Acknowledgements
We would like to thank Mendel Rosenblum, Vern Paxson, Nicholas Weaver, Mark Allman and Bill Cheswick
for their helpful comments on this project. We also
like to thank the anonymous reviewers for their feedback
and especially our shepherd, Michael Roe, for his guidance. This research was supported in part by the Stanford Clean Slate program, the 100x100 project and NSF.
Part of this research was performed while on appointment as a U.S. Department of Homeand Security (DHS)
Fellow under the DHS Scholarship and Fellowship Program, a program administered by the Oak Ridge Institute
for Science and Education (ORISE) for DHS through an
interagency agreement with the U.S Department of Energy (DOE). ORISE is managed by Oak Ridge Associated Universities under DOE contract number DE-AC0500OR22750. All opinions expressed in this paper are the
authors and do not necessarily reflect the policies and
views of DHS, DOE, ORISE, or NSF. This work was also
supported in part by TRUST (The Team for Research in
Ubiquitous Secure Technology), which receives support
from the National Science Foundation (NSF award number CCF-0424422).
References
[1] 802.1D MAC Bridges. http://www.ieee802.org/1/pages/802.1D2003.html.
[2] Apani home page. http://www.apani.com/.
[3] Consentry home page. http://www.consentry.com/.
[4] DNS Service Discover (DNS-SD). http://www.dns-sd.org/.
[5] Lumeta. http://www.lumeta.com/.
[6] UPnP Standards. http://www.upnp.org/.
[7] Cisco Security Advisory: Cisco IOS Remote Router Crash.
http://www.cisco.com/warp/public/770/ioslogin-pub.shtml, August 1998.
[8] CERT Advisory CA-2003-13 Multiple Vulnerabilities in Snort
Preprocessors. http://www.cert.org/advisories/CA-2003-13.html,
April 2003.
[9] Sasser Worms Continue to Threaten Corporate Productivity.
http://www.esecurityplanet.com/alerts/article.php/3349321, May
2004.
[10] Technical Cyber Security Alert TA04-036Aarchive HTTP Parsing Vulnerabilities in Check Point Firewall-1. http://www.uscert.gov/cas/techalerts/TA04-036A.html, February 2004.
[11] ICMP Attacks Against TCP Vulnerability Exploit.
http://www.securiteam.com/exploits/5SP0N0AFFU.html, April
2005.
Notes
1 A policy might be specified by many people (e.g, LDAP), but is
typically centrally managed.
2 SANE is agnostic to the PKI or other authentication mechanism in
use (e.g. Kerberos, IBE). Here, we will assume principals and switches
have keys that have been certified by the enterprises CA.
3 To establish shared keys, we opt for a simple key-exchange protocol from the IKE 2 [28] suite.
4 Request capabilities are similar to network capabilities as discussed in [12, 51]
5 We use the same IV for all layersas opposed to picking a new
random IV for each layerto reduce the capabilitys overall size. For
standard modes of operation (such as CBC and counter-mode) reusing
the IV in this manner does not impact security, since each layer uses a
different symmetric key.
6 For example, while SANEs protection layer prevents an adversary from targeting arbitrary switches, an attacker can attempt to target
a switch indirectly by accessing an upstream server for which it otherwise has access permission.
7 Normally, DC packet headers contain a consistent sender-ID in
cleartext, much like the IPSec ESP header. This sender-ID tells the
DC which key to use to authenticate and decrypt the payload. We replace this static ID with an ephemeral nonce provided by the DC. Every
DC response contains a new nonce to use as the sender-ID in the next
message.
8 Implementing threshold cryptography for symmetric encryption is
done combinatorially [16]: Start from a t-out-of-t sharing (namely, en-
;login:, 24(Security),
14
[20] Y. Desmedt and Y. Frankel. Threshold cryptosystems. In Advances in Cryptology - Crypto 89, 1990.
[21] J. R. Douceur. The Sybil attack. In First Intl. Workshop on Peerto-Peer Systems (IPTPS 02), Mar. 2002.
[41] J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, and H. Zhang. Network-Wide Decision
Making: Toward A Wafer-Thin Control Plane. In Proceedings
of HotNets III, November 2004.
[42] P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB: A BlockCipher Mode of Operation for Efficient Authenticated Encryption. In ACM Conference on Computer and Communications Security, pages 196205, 2001.
[23] D. M. Goldschlag, M. G. Reed, and P. F. Syverson. Hiding Routing Information. In R. Anderson, editor, Proceedings of Information Hiding: First International Workshop, pages 137150.
Springer-Verlag, LNCS 1174, May 1996.
[43] T. Roscoe, S. Hand, R. Isaacs, R. Mortier, and P. Jardetzky. Predicate routing: Enabling controlled networking. SIGCOMM Comput. Commun. Rev., 33(1):6570, 2003.
[44] J. Veizades, E. Guttman, C. Perkins, and S. Kaplan. Service location protocol. RFC 2165, july 1997.
[24] A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan, and H. Zhang. A Clean Slate 4D
Approach to Network Control and Management. In In ACM SIGCOMM Computer Communication Review, October 2005.
[28] C. Kaufman. Internet key exchange (ikev2) protocol. draft-ietfipsec-ikev2-10.txt (Work in Progress).
[29] A. Kumar, V. Paxson, and N. Weaver. Exploiting underlying
structure for detailed reconstruction of an internet-scale event. In
to appear in Proc. ACM IMC, October 2005.
15