Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Changhoon Kim - Princeton University Alexandre Gerber, Carsten Lund, Dan Pei, and Subhabrata Sen – AT&T Labs  Logically-isolated communication channels for corporate customers, overlayed on top of a provider backbone  Direct any-to-any reachability at IP-layer among customer sites  Allows customers to avoid full-meshing and outsource routing  Service growing very fast Site 1 PE PE Provider-Edge Router PE PE PE Site 2 Provider Backbone Site 3 2  For isolation, a virtual PE (VPE) is created per VPN per site  Each VPE stores routing information in its own VPN  For scalability, packet forwarding in the backbone is oblivious of customer addresses (i.e., uses encapsulation)  Makes it impossible to aggregate customer addresses inside the backbone X site VPEX addr. X site VPEX Y site VPEY Tunnel VPEX X site VPEY Y site Provider Backbone 3 Each VPE must maintain full routing info. in the VPN (i.e., routes to every address block used in each site) Memory footprint of a VPE (forwarding table size) VPEX PE1 VPEX PE2 PE3 VPEZ VPEX PE4 VPEY VPEY VPEZ VPEX VPEY VPEZ PE memory 4 Unused port (network interface) VPEX PE VPEY VPEZ Used port  Memory is full, whereas lots of ports are still unused  However, revenue is proportional to provisioned bandwidth (i.e., number of used ports), not memory usage  Large VPN with a thin connection per site is the worst case  Unfortunately, there are many such worst cases 5  Forwarding tables’ size keeps increasing  Number of VPN routes significantly larger than number of IPv4 routes  Several routers constantly running in the “red zone”  Providers are forced to increase expenditure or accommodate customers sub-optimally  Increasing fast-access memory is very hard (or extremely expensive at the least)  Due to H/W specific constraints, such as power, heat, space, etc. (e.g. forwarding tables built with TCAM or SRAM)  So many routers and line cards prohibitively expensive  The growth of number of ports might out-pace the growth of memory size  What can we do better with existing resources, function, and capabilities? 6 7 Most (84%) PEs communicate only with a small number (~10%) of popular PEs. Hub: needs to maintain full reachability Spokes: full reachability is luxury 8 The any-to-any reachability model requires traffic indirection through a hub. 9  Each VPN has two different types of PEs  Hubs: Maintain full reachability information of a VPN  Spokes: Maintain local routes and a single default route to a hub  Each spoke uses a hub consistently for all non-local traffic Indirect forwarding HubX PE1 SpokeY SpokeZ SpokeX PE2 SpokeX PE3 HubY SpokeY SpokeZ HubZ 10  Two problems to solve  Hub selection: Which PEs should be hubs?  Hub assignment: Which hub should a given spoke use?  Caveat: Solve the problems individually for each VPN  Hub selection and assignment decision for a VPN is totally independent of that of other VPNs  Ensures both simplicity and flexibility 11  Analyzed more than 100 VPNs with real traffic, and identified heavy sources and sinks  Heavy PEs: Those sending or receiving more than 10% of the total traffic in their VPNs  Heavy sources or sinks are only around 22% of all PEs  When choosing those heavy PEs as hubs  Memory footprint (i.e., # of routes) reduces by 73+%  ~10% of conversations experience path inflation larger than 1000 miles (12ms), and up to 5000 miles (58ms)  Optimal hub assignment does not reduce path inflation  Need better sets of hubs 12  Notation  PE set: P = {1, 2, L , n}  Hub set: H ⊆ P  The hub of PE : hub(i ) ∈ H  Usage-based conversation matrix: C = (ci , j )  Latency matrix: L = (li , j )  Formulation  Choose as few hubs as possible, while limiting additional distance due to Relaying min H s.t. ∀s, d ∈ P whose cs ,d = 1, ls ,hub ( s ) + lhub ( s ),d − ls ,d ≤ θ Parameter 13  Latency-Constrained Relaying problem is NP-hard  Set Cover ≤ P LCR  A greedy approximation  Build a “serve-use” graph based on the latency constraint  Find the fewest nodes on the left bank that covers every node on the right bank  At each iteration, greedily choose a hub with largest Si S1 = {1,3} 1 1 S2 = {2} 2 2 S3 = {1,3,5} 3 3 S4 = {1,2,4} 4 4 S5 = {5} 5 5 Serve side Use side 14 Based on entire traffic in the VPNs in May 13 – 19, 2007 Percentage (%) 100 Gain 80 60 40 Cost 1) LCR can save ~90% memory with very small path inflation. Fraction of routes removed 2) The amount of relayed traffic is Fraction of traffic relayed rather high.Increase of backbone load Cost ~ 11.5 msec 20 ~ 2.5 msec 0 msec 15  Motivating questions  Can we avoid periodic monitoring and re-adjustment overhead, and still save memory?  What if we bound additional latency for any future communications?  Instead of using the usage-based conversation matrix , use a hypothetical full-mesh conversation matrix ( )  Full-mesh conversation matrix: C full = ci full ,j full i, j c 1 (i ≠ j ) = 0 (i = j )  Formulation with min H s.t. full ∀s,d ∈ P whose c s,d = 1, ls,hub(s) + lhub(s),d − ls,d ≤ θ 16 C full Percentage (%) 100 80 60 40 Fraction of routes removed Significant memory saving even with Fraction of traffic relayed full-mesh conversation patterns Increase of backbone load 20 17  Motivation  Reduce both hub set size and the amount of traffic relayed  Notation  positive (ci , j = 1) vi , j =  (ci , j = 0)  0 ( )  Volume matrix: V = vi , j  Formulation  Additionally minimize the sum of volume and additional distance products min H , ∑v s,d ⋅ (ls,hub(s) + lhub(s),d − ls,d ) ∀s,d ∈P s.t. ∀s,d ∈ P whose c s,d = 1, ls,hub(s) + lhub(s),d − ls,d ≤ θ 18 Percentage (%) 100 Fraction of routes removed Fraction of traffic relayed Increase of backbone load LCR 80 60 40 20 Fraction of routes removed LCVSR can saveofnearly as much Fraction traffic relayed backbone load memoryIncrease as LCR of does and reduces LCR relayed traffic volume as well. LCR 19  Requires only minor routing protocol configuration change at PEs  Details in the paper  Performance degrades slightly over time  Cost curves are fairly robust (especially robust when using LCR)  Weekly/monthly adjustment: 94/91% of hubs remain as hubs  Ensuring highly availability is possible  Having more than one hub located at different cities ensures high availability  98.3% of VPNs spanning 10+ PEs have at least 2 hubs anyway  Enforcing “|H| > 1” reduces memory saving by only 0.5% 20  Large memory footprint of VPN service is a critical problem that large providers face today  Relaying can substantially reduce VPNs’ memory footprint (80 ~ 90%) for small increase of latency (3 ~ 11 ms) and backbone utilization (~ 7% of VPN traffic)  Relaying is simple and easy to implement, and transparent  Under evaluation by network engineers in a large ISP  Future work  Devising a better solution for LCVSR  Having hubs store smaller disjoint sets of routes, rather than all  Combining Relaying with route caching 21