slides

Changhoon Kim

Changhoon Kim - Princeton University Alexandre Gerber, Carsten Lund, Dan Pei, and Subhabrata Sen – AT&T Labs Logically-isolated communication channels for corporate customers, overlayed on top of a provider backbone Direct any-to-any reachability at IP-layer among customer sites Allows customers to avoid full-meshing and outsource routing Service growing very fast Site 1 PE PE Provider-Edge Router PE PE PE Site 2 Provider Backbone Site 3 2 For isolation, a virtual PE (VPE) is created per VPN per site Each VPE stores routing information in its own VPN For scalability, packet forwarding in the backbone is oblivious of customer addresses (i.e., uses encapsulation) Makes it impossible to aggregate customer addresses inside the backbone X site VPEX addr. X site VPEX Y site VPEY Tunnel VPEX X site VPEY Y site Provider Backbone 3 Each VPE must maintain full routing info. in the VPN (i.e., routes to every address block used in each site) Memory footprint of a VPE (forwarding table size) VPEX PE1 VPEX PE2 PE3 VPEZ VPEX PE4 VPEY VPEY VPEZ VPEX VPEY VPEZ PE memory 4 Unused port (network interface) VPEX PE VPEY VPEZ Used port Memory is full, whereas lots of ports are still unused However, revenue is proportional to provisioned bandwidth (i.e., number of used ports), not memory usage Large VPN with a thin connection per site is the worst case Unfortunately, there are many such worst cases 5 Forwarding tables’ size keeps increasing Number of VPN routes significantly larger than number of IPv4 routes Several routers constantly running in the “red zone” Providers are forced to increase expenditure or accommodate customers sub-optimally Increasing fast-access memory is very hard (or extremely expensive at the least) Due to H/W specific constraints, such as power, heat, space, etc. (e.g. forwarding tables built with TCAM or SRAM) So many routers and line cards prohibitively expensive The growth of number of ports might out-pace the growth of memory size What can we do better with existing resources, function, and capabilities? 6 7 Most (84%) PEs communicate only with a small number (~10%) of popular PEs. Hub: needs to maintain full reachability Spokes: full reachability is luxury 8 The any-to-any reachability model requires traffic indirection through a hub. 9 Each VPN has two different types of PEs Hubs: Maintain full reachability information of a VPN Spokes: Maintain local routes and a single default route to a hub Each spoke uses a hub consistently for all non-local traffic Indirect forwarding HubX PE1 SpokeY SpokeZ SpokeX PE2 SpokeX PE3 HubY SpokeY SpokeZ HubZ 10 Two problems to solve Hub selection: Which PEs should be hubs? Hub assignment: Which hub should a given spoke use? Caveat: Solve the problems individually for each VPN Hub selection and assignment decision for a VPN is totally independent of that of other VPNs Ensures both simplicity and flexibility 11 Analyzed more than 100 VPNs with real traffic, and identified heavy sources and sinks Heavy PEs: Those sending or receiving more than 10% of the total traffic in their VPNs Heavy sources or sinks are only around 22% of all PEs When choosing those heavy PEs as hubs Memory footprint (i.e., # of routes) reduces by 73+% ~10% of conversations experience path inflation larger than 1000 miles (12ms), and up to 5000 miles (58ms) Optimal hub assignment does not reduce path inflation Need better sets of hubs 12 Notation PE set: P = {1, 2, L , n} Hub set: H ⊆ P The hub of PE : hub(i ) ∈ H Usage-based conversation matrix: C = (ci , j ) Latency matrix: L = (li , j ) Formulation Choose as few hubs as possible, while limiting additional distance due to Relaying min H s.t. ∀s, d ∈ P whose cs ,d = 1, ls ,hub ( s ) + lhub ( s ),d − ls ,d ≤ θ Parameter 13 Latency-Constrained Relaying problem is NP-hard Set Cover ≤ P LCR A greedy approximation Build a “serve-use” graph based on the latency constraint Find the fewest nodes on the left bank that covers every node on the right bank At each iteration, greedily choose a hub with largest Si S1 = {1,3} 1 1 S2 = {2} 2 2 S3 = {1,3,5} 3 3 S4 = {1,2,4} 4 4 S5 = {5} 5 5 Serve side Use side 14 Based on entire traffic in the VPNs in May 13 – 19, 2007 Percentage (%) 100 Gain 80 60 40 Cost 1) LCR can save ~90% memory with very small path inflation. Fraction of routes removed 2) The amount of relayed traffic is Fraction of traffic relayed rather high.Increase of backbone load Cost ~ 11.5 msec 20 ~ 2.5 msec 0 msec 15 Motivating questions Can we avoid periodic monitoring and re-adjustment overhead, and still save memory? What if we bound additional latency for any future communications? Instead of using the usage-based conversation matrix , use a hypothetical full-mesh conversation matrix ( ) Full-mesh conversation matrix: C full = ci full ,j full i, j c 1 (i ≠ j ) = 0 (i = j ) Formulation with min H s.t. full ∀s,d ∈ P whose c s,d = 1, ls,hub(s) + lhub(s),d − ls,d ≤ θ 16 C full Percentage (%) 100 80 60 40 Fraction of routes removed Significant memory saving even with Fraction of traffic relayed full-mesh conversation patterns Increase of backbone load 20 17 Motivation Reduce both hub set size and the amount of traffic relayed Notation  positive (ci , j = 1) vi , j =  (ci , j = 0)  0 ( ) Volume matrix: V = vi , j Formulation Additionally minimize the sum of volume and additional distance products min H , ∑v s,d ⋅ (ls,hub(s) + lhub(s),d − ls,d ) ∀s,d ∈P s.t. ∀s,d ∈ P whose c s,d = 1, ls,hub(s) + lhub(s),d − ls,d ≤ θ 18 Percentage (%) 100 Fraction of routes removed Fraction of traffic relayed Increase of backbone load LCR 80 60 40 20 Fraction of routes removed LCVSR can saveofnearly as much Fraction traffic relayed backbone load memoryIncrease as LCR of does and reduces LCR relayed traffic volume as well. LCR 19 Requires only minor routing protocol configuration change at PEs Details in the paper Performance degrades slightly over time Cost curves are fairly robust (especially robust when using LCR) Weekly/monthly adjustment: 94/91% of hubs remain as hubs Ensuring highly availability is possible Having more than one hub located at different cities ensures high availability 98.3% of VPNs spanning 10+ PEs have at least 2 hubs anyway Enforcing “|H| > 1” reduces memory saving by only 0.5% 20 Large memory footprint of VPN service is a critical problem that large providers face today Relaying can substantially reduce VPNs’ memory footprint (80 ~ 90%) for small increase of latency (3 ~ 11 ms) and backbone utilization (~ 7% of VPN traffic) Relaying is simple and easy to implement, and transparent Under evaluation by network engineers in a large ISP Future work Devising a better solution for LCVSR Having hubs store smaller disjoint sets of routes, rather than all Combining Relaying with route caching 21

Log In

slides

slides

Related Papers

RELATED PAPERS