Emerging reconfigurable data centers introduce unprecedented flexibility in how the physical layer can be programmed to adapt to current traffic demands. These reconfigurable topologies are commonly hybrid, consisting of static and reconfigurable links, enabled by e.g., an Optical Circuit Switch (OCS) connected to top-of-rack switches in Clos networks. Even though prior work has showcased the practical benefits of hybrid networks, several crucial performance aspects are not well understood. For example, many systems enforce artificial segregation of the hybrid network parts, leaving money on the table.
In this article, we study the algorithmic problem of how to jointly optimize topology and routing in reconfigurable data centers, in order to optimize a most fundamental metric, maximum link load. The complexity of reconfiguration mechanisms in this space is unexplored at large, especially for the following cross-layer network-design problem: given a hybrid network and a traffic matrix, jointly design the physical layer and the flow routing in order to minimize the maximum link load.
We chart the corresponding algorithmic landscape in our work, investigating both un-/splittable flows and (non-)segregated routing policies. A topological complexity classification of the problem reveals NP-hardness in general for network topologies that are trees of depth at least two, in contrast to the tractability on trees of depth one. We moreover prove that the problem is not submodular for all these routing policies, even in multi-layer trees.
However, networks that can be abstracted by a single packet switch (e.g., nonblocking Fat-Tree topologies) can be optimized efficiently, and we present optimal polynomial-time algorithms accordingly. We complement our theoretical results with trace-driven simulation studies, where our algorithms can significantly improve the network load in comparison to the state-of-the-art.
1 Introduction
Data centers nowadays empower everyday life in aspects such as business, health, and industry, but also science and social interactions. With the rise of related data-intensive workloads as generated by machine learning, artificial intelligence, and the distributed processing of big data in general, data center traffic is growing very fast [63, 73]. Much of this traffic is internal to data centers, evoking considerable interest in data center design problems [64, 84].
Herein the emergence of a programmable physical layer, enabled by optical circuit switches [29, 50, 82], free-space optics [12, 37], or beamformed wireless connections [44, 45], leads to intriguing new possibilities, as leveraging fully electrically packet switched networks “is increasingly cost prohibitive and likely soon infeasible” [60, 62], see also the recent report by Microsoft [27]. In other words, electrical chips are unlikely to deliver sufficient performance for next-generation networks, and in turn, we must rely on programmable optical topologies for increased bandwidth, connectivity, and power-efficiency [5].
Extensive past work has already shown significant benefits of such reconfigurable data center networks [34, 43], but the underlying complexity is not well understood [11]. For example, many works artificially restrict their flow routing policies to be segregated between programmable and static network parts, aiming to place elephant flows on reconfigurable links [33].
Whereas some general algorithmic results exist w.r.t. latency [32, 37] or specific traffic patterns [10, 80], complexity questions of network-design for the objective of load-optimization are mostly uncharted. The exceptions are the work by Yang et al. [87], which focuses on the hardness induced by wireless interference, the work by Zheng et al. [91], who provide intractability results on general non-data-center topologies, and the results by Dai et al. [20], which uncover the approximation hardness for special settings. However, tree-induced topologies, as commonly employed in data centers, e.g., Fat-Tree, have yet not been exposed to a fine-grained complexity analysis, which can reveal a complexity dichotomy between network designs, as we will show in this article.
At the same time, link load is a most central performance metric [15, 44, 46, 76], and flow routing in traditional networks has been investigated for decades already [3]. We are thus motivated by the desire to take the first steps towards fundamentally understanding the network-design problem for load-optimization in data center networks, jointly considering flow routing and (interference-free) physical layer programmability enabled by, e.g., optical circuit switches.
1.1 Contributions
This article initiates the network-design study of load-optimization in reconfigurable networks with optical circuit switches, leveraging the flexibility of emerging programmable physical layers for flow routing. We investigate multiple problem dimensions, from splittable to unsplittable flows, to fully flexible (non-segregated) versus segregated routing policies. Our results not only include efficient algorithms and complexity characterizations but also simulations on real-world workloads:
(1)
Complexity: We prove strong NP-hardness for non-segregated and segregated routing on tree networks of height greater or equal than two, for un-/splittable flow models, excluding star networks, which are summarized in Table 1. Moreover, all four problem settings are not submodular w.r.t. load-optimization, preventing common approximation techniques.
(2)
Algorithms: In turn, we give polynomial-time optimal algorithms for the hybrid switch model of Venkatakrishnan et al. [79], which applies to non-blocking data center interconnects as, e.g., Fat-Trees. To this end, we leverage a combination of subset matching results and topology-specific insights.
(3)
Evaluations: Our workload-driven simulations (using Facebook, pFabric, and high-performance computing traces) show that our algorithms significantly improve on state-of-the-art methods, decreasing the maximum load by \(1.6\times\) to \(2.0\times\).
Table 1. Network-design Complexity for Load-optimization in Reconfigurable Networks for Un-/splittable and Non-/segregated Routings when the Topologies are Trees of Height \(h=1\) and \(h\ge 2\)
Overview. We start with a formal model and preliminaries in Section 2, followed by complexity (Section 3) results for trees and algorithms for the hybrid switch model (Section 4). We then investigate the performance of our algorithms with trace-driven evaluations in Section 5. Lastly, we discuss related work in Section 6 and conclude in Section 7.
2 Model and Preliminaries
Network model. Let \(N=(V,E,\mathcal {E},C)\) be a hybrid network [56, 79] connecting the n nodes \(V=\lbrace v_1,\dots ,v_n\rbrace\) (e.g., top-of-the-rack switches), using static links E (usually connected by electrical packet switches). The network N also contains a set of reconfigurable (usually optical) links \(\mathcal {E}\). The graph \((V,E\cup \mathcal {E})\) is a bidirected1 graph such that two directions of each bidirected link \(\lbrace v_i,v_j\rbrace \in E\) (respectively, \(\lbrace v_i,v_j\rbrace \in \mathcal {E}\)), where \(v_i,v_j\in V\), work as two (anti-parallel) directed links \((v_i,v_j)\) and \((v_j,v_i),\) respectively. We use the symbol \(\overrightarrow{E}\) (respectively, \(\overrightarrow{\mathcal {E}}\)) to denote the set of corresponding directed links of E (respectively, \(\mathcal {E}\)). Moreover, a function \(C: \overrightarrow{E}\cup \overrightarrow{\mathcal {E}}\mapsto \mathbb {R}^+\) defines capacities for both directions of each bidirected link in \(E\cup \mathcal {E}\). Note that \((V,E\cup \mathcal {E})\) can be a multi-graph, e.g., when a reconfigurable link in \(\mathcal {E}\) also connects two endpoints of a static link in E.
Reconfigured network. We say that a hybrid network N is reconfigured by a reconfigurable switch S if some reconfigurable links \(M\subseteq \mathcal {E}\), which must induce a matching,2 are configured (implemented) by S to enhance the static network \((V,E)\). The set of configured (bidirected) links M, i.e., a matching, is called a reconfiguration of N. The enhanced network obtained by integrating the configured links M with the static links E of the hybrid network N is called a reconfigured network, i.e., \(N(M)=\left(V, E\cup M \right)\). The static network \((V,E)\) of the hybrid network N before reconfiguration can also be thought as a reconfigured network denoted by \(N(\emptyset)\).
Hardware. Our results also apply to non-optical switches and links, as long as they match the theoretical properties described in the model. As such, we will only talk about reconfigurable switches and reconfigurable links, implying any appropriate technology that matches our model.
Topologies. Our network model does not place a restriction on the underlying static topology and hence can be applied generally. Notwithstanding, for our hardness results in Section 3, already tree topologies suffice, whereas our positive algorithmic results cover many data center topologies, as we elaborate from Section 4 onwards.
Traffic demands. The resulting network should serve a certain communication pattern, represented as a \(|V| \times |V|\) communication matrix \(D:=(d_{ij})_{|V| \times |V|}\) (demands) with non-negative real-valued entries. An entry \(d_{ij}\in \mathbb {R}^+\) represents the traffic load (frequency) or a demand from the node \(v_i\) to the node \(v_j\). With a slight abuse of notation, let \(D(v_i,v_j)\) also denote a demand from \(v_i\) to \(v_j\) hereafter.
Routing models. For networking, unsplittable routing requires that all flows of a demand must be sent along a single (directed) path, while splittable routing does not restrict the number of paths used for the traffic of each demand; For a reconfigured network, segregated routing requires flows being transmitted on either static links or configured links, but non-segregated routing admits configured links to be used as shortcuts for flows along static links [29, 82]. Hence, there are four different routing models: Unsplittable and Segregated(US), Unsplittable and Non-segregated(UN), Splittable and Segregated(SS), and Splittable and Non-segregated(SN).
2.1 Load Preliminaries
“As minimizing the maximum congestion level of all links is a desirable feature of DCNs [44, 46], the objective of our work is to minimize the maximum link utilization of the entire network.”
Yang et al. [87], presented at ACM SIGMETRICS 2020 [88]
Load optimization. Given a reconfigured network \(N(M)\) and demands D, let \(f:\overrightarrow{E}\cup \overrightarrow{M}\mapsto \mathbb {R}^+\) be a feasible flow serving demands D in \(N(M)\) under a routing model \(\tau \in \lbrace \text{US}, \text{UN}, \text{SS}, \text{SN}\rbrace\). The load of each directed link\(e\in \overrightarrow{E}\cup \overrightarrow{M}\) induced by the flow f is defined as \(L(f\left(e\right)): = f\left(e\right)/C\left(e\right)\). Then, for a feasible flow f in \(N(M)\), the maximum load is defined as \({L_\text {max}(f)}:=\max \lbrace L(f(e)): e\in \overrightarrow{E}\cup \overrightarrow{M}\rbrace\), and there must be an optimal flow\(f_{\text{opt}}\) to serve D such that its maximum load is minimized for all feasible flows in \(N(M)\). Such an optimal flow is called a load-optimization flow in \(N(M)\).3 For a reconfigured network \(N(M)\), with a slight abuse of notation, let \(f^{M}_{\text{opt}}\) denote an arbitrary load-optimization flow in \(N(M)\), then we define a function \(L_{\text{min-max}}(N(M)):=L_{\text{max}}(f^{M}_{\text{opt}})\). Load-optimization reconfiguration problem. Given a hybrid network N, a routing model \(\tau \in \left\lbrace \text{US}, \text{UN}, \text{SS}, \text{SN} \right\rbrace\), and demands D, the \(\tau\)-load-optimization reconfiguration problem is to find an optimal reconfiguration\(M\subseteq \mathcal {E}\) to generate an optimally reconfigured network\(N(M)\) such that \(L_{\text{min-max}}\left(N\left(M\right) \right)\) is minimized for all valid reconfigurations \(M_i\subseteq \mathcal {E}\) of N. The \(\tau\)-load-optimization reconfiguration problem is also abbreviated as the \(\tau\)-reconfiguration problem henceforth. We lastly need to find a load-optimization flow for the optimally reconfigured network.
To illustrate the \(\tau\)–optimization reconfiguration problem, we give a small example in Figure 1. Figure 1(a) depicts the hybrid network before adding any reconfiguration, with five nodes \(V=\lbrace a,b,c,d,e\rbrace\), four static (bidirected) links E: \(\lbrace d,c\rbrace\), \(\lbrace b,c\rbrace\), \(\lbrace a,c\rbrace\), and \(\lbrace e,c\rbrace\) and six reconfigurable (bidirected) links \(\mathcal {E}\): \(\lbrace a,d\rbrace\), \(\lbrace d,b\rbrace\), \(\lbrace b,e\rbrace\), \(\lbrace a,e\rbrace\), \(\lbrace a,b\rbrace\), and \(\lbrace d,e\rbrace\).
Fig. 1.
We consider the routing model \(\tau =\text{SN}\) and a capacity function \(\forall e\in \overrightarrow{E}\cup \overrightarrow{\mathcal {E}}: C(e)=20\), with the six demands \(D\left(a,b\right) =8\), \(D\left(a,c\right)=6\), \(D\left(c,b\right) =6\), \(D\left(d,b\right) =6\) and \(D\left(a,e\right) =6\). In Figure 1(a), each flow can only be routed along static links, creating a link load of \(20/20=1\) on, e.g., \(\left(a,c\right)\) with three demands of size 8,6,6 from a. In order to improve the maximum link load, one could, e.g., greedily add reconfigurable links in order to reduce the maximum load, such as \(\lbrace a,b\rbrace\) in Figure 1(b). Now, the demand \(D\left(a,b\right) =8\) is routed directly, reducing the maximum load to just 0.6. Yet, only one further reconfigurable link can be chosen, \(\lbrace d,e\rbrace\), without violating the matching constraints. In this situation, any further rerouting does not decrease the maximum link load. For example, when attempting to alleviate the load of 0.6 on \(\left(c,b\right)\), the load on \(\left(a,c\right)\) will increase, and vice versa, in the best case canceling each other’s load increase.
Notwithstanding, we can improve the maximum load further. To this end, we select \(\lbrace a,e\rbrace\) and \(\lbrace d,b\rbrace\) as reconfigurable links, as shown in Figure 1(c). At first, this might seem counter-intuitive, as \(D\left(a,e\right)\) and \(D\left(d,b\right)\) are only of size 6 each, leaving a load of 0.7 on the links \(\left(a,c\right)\) and \(\left(c,b\right)\). However, the demand \(D\left(a,b\right) =8\) can be routed indirectly, via the path \(\lbrace a,e,c,d,e\rbrace\), yielding an optimal maximum link load of 0.5.
3 Complexity
In this section, we consider the underlying complexity of the load-optimization problem in reconfigurable networks. We begin with the investigation of NP-hardness, where we study segregated routing (Section 3.1) and non-segregated routing (Section 3.2). For all four routing models, we prove NP-hardness for trees of any height of two or greater.
Yang et al. [87] considered the case of unsplittable segregated routing on trees and weak NP-hardness, i.e., for large demand sizes. Our NP-hardness results also hold for small demand sizes and we moreover extend the previous result [87] to trees of height one. To show hardness, we can consider special cases, where all directed links have the same capacity of \(\gamma \in \mathbb {R}^+\). In particular, we set \(\gamma =1\) in all our NP-hard proofs s.t. the load of each link equals the flow size on itself, but our proofs work for arbitrary \(\gamma\).
We then prove in Section 3.3 that all four routing models are not submodular, i.e., resist common approximation schemes. Venkatakrishnan et al. [79] considered different objective functions and showed submodularity for the hybrid switching model, resulting in approximation algorithms which therefore cannot be applied here.
3.1 Segregated Routing
We start with the case of segregated routing w.r.t. NP-hardness. The following and some later proofs will make use of the strongly NP-hard 3-Partition problem, which we define first:
3.2 Non-segregated Routing
For the non-segregated routing model, we obtain
—
weak NP-hardness for trees of height \(h=1\) in the UN model,
—
strong NP-hardness for trees of height \(h\ge 2\) in the UN model,
—
strong NP-hardness for trees of height \(h\ge 2\) in the SN model.
For the UN model, we start with the weakly NP-hard case of \(h=1\) in Theorem 3.4, followed by the strongly NP-hard case of \(h=2\) in Theorem 3.5. To show the weakly NP-hardness, we will give a reduction from the weakly NP-hard 2-Partition problem, which is defined as follows:
Now, it remains to cover intractability for the fourth and remaining routing model:
3.3 Non-submodularity
The submodularity of objective functions plays an important role in approximating optimization problems [78], as by Venkatakrishnan et al. [79] for hybrid switch networks. However, their objective function does not consider load-balancing and hence does not apply in our setting, as we show next.
Definition of submodularity. We recall the definition of submodularity [38]: A function \(f:2^{B}\mapsto \mathbb {R}\), where \(2^{B}\) is a power set of a finite set B, is submodular if it satisfies that for every \(X,Y \subseteq B\) with \(X\subseteq Y\) and every \(x\in B \setminus Y\),
Overview. In this subsection, we investigate the submodularity of the objective function \(\Phi\) of a \(\tau\)-reconfiguration problem, which minimizes the maximum load of reconfigured networks \(N(M)\), i.e., \(L_{\text{min-max}}\left(N \left(M\right) \right)\), for all valid reconfigurations M of a given hybrid network N. Moreover, we are also interested in the submodularity of the objective function \(\Omega\) that maximizes the gap of the minimized maximum load between the given hybrid network N before reconfiguration and reconfigured networks \(N(M)\) for reconfigurations M of N. We will show that both functions \(\Phi\) and \(\Omega\) are not submodular functions. To prove the functions to be not submodular, we present special instances as counter-examples.
4 Hybrid Switch Networks
As we saw before, already tree networks of height \(\ge 2\) are NP-hard to optimize, and optimizations leveraging submodularity are not possible. Yet it is worth noting that the NP-hardness for stars, i.e., trees of depth one, is still open since the NP-hardness established for trees of height \(\ge 2\) collapses on simple structures of star topologies. In fact, many NP-hard problems can become tractable after restricting the input graphs, e.g., the minimum vertex cover becomes polynomially solvable on trees by using dynamic programming [19]. This raises the interesting question if we can obtain optimal and polynomial-time algorithms for a data center network that can be abstracted as a star topology.
4.1 Non-blocking Data Center Topologies
Common data center topologies have trees of height 2 as subgraphs or minors and hence seem like bad candidates for efficient algorithms at first glance. However, already early designs adapted from telecommunications such as Clos [18] topologies have a so-called non-blocking property, which we can use to our advantage. An interconnecting topology \(\mathcal {C}\) is non-blocking, if the servability of a flow from \(v_1\) to \(v_2\) via \(\mathcal {C}\) only depends on the utilization of the links \((v_1,\mathcal {C})\) and \((\mathcal {C}, v_2)\) : “such an interconnect behaves like a crossbar switch” [89]. In other words, from a load-utilization perspective, the maximum load inside \(\mathcal {C}\) will not be higher than on the egress/ingress links of \(\mathcal {C}\). Non-blocking interconnects have hence become popular data center topologies [4] in particular in the form of folded Clos networks or Fat-Trees [54], depicted in Figure 2(a): the actual topology inside the interconnect (marked in a blue rectangle) is immaterial and we only need to consider the links incident to the nodes4—a fact commonly used, e.g., for bandwidth guarantees of the hose model [25] in Clos topologies [40, Section 4.1].
Fig. 2.
Thus, for our purposes, we can abstract the data center interconnect \(\mathcal {C}\) (which can be understood as a packet switch) by a single center node c, leaving our previous intractability considerations behind. We hence turn our attention to hybrid switch networks as considered by of Venkatakrishnan et al. [79], which are represented by a packet and a circuit switch connected to all nodes, see Figure 2(b).
Routing in hybrid switch networks is straightforward (only one path exists for each node pair in the packet switched network), but the addition of a circuit switch adds a large degree of freedom: First, the number of possible matchings grows exponentially, and second, we have to decide for each flow which path to take as well. Notwithstanding, the special structure of hybrid switch networks allows us to solve reconfiguration and routing efficiently.
We structure our approach as follows. We first introduce an auxiliary problem in Section 4.2 and a constant-time triangle graph algorithm in Section 4.3, which we then leverage for our optimal algorithm in Section 4.4. We lastly discuss performance bounds and extensions in Section 4.5.
4.2 Red-target Matching
As each configuration of an OCS must be a matching, we cannot simultaneously create a reconfigurable connection for each demand. Still, intuitively, it is desirable to relieve the nodes, respectively, node-pairs with higher communication intensities by reconfigurable links. Later in our algorithms, we will mark some nodes (in red) which must be connected to the OCS in order to satisfy a given load threshold. However, not all reconfigurations, i.e., matchings, are suitable for such a task. Given such red-colored nodes, the question is if all such red nodes can be matched accordingly, which is formalized in Definition 4.1:
To illustrate Definition 4.1, we introduce an example shown in Figure 3. The RTM problem looks for a restricted matching, which not only satisfies the degree-bound of a matching but also contains the set of all colored nodes \(V^{\prime }\subseteq V\).
Fig. 3.
4.3 Selection of Suitable Reconfigurable Links
In the studied hybrid switch networks, reconfigurable links can be created between any pair of nodes connected to the packet switch, e.g., via an OCS. While we will select the (matching) subset of reconfigurable (bidirected) links in the next subsection, we herein identify the benefit of adding specific reconfigurable links.
It remains to utilize the single triangle algorithms in a larger context: Lemma 4.4 shows that the optimal flow computed locally in each triangle \(\lbrace v_i,c, v_j\rbrace\) provides a lower bound for the subflow of a globally optimal flow of the hybrid switch network N and demands D in the same triangle, and Lemma 4.5 further indicates that a globally optimal flow can be obtained by combining these locally optimal flows in triangles.
4.4 Solving Hybrid Switch Networks Optimally
We now combine our previous results to optimally solve the reconfiguration problem on hybrid switch networks.5
We now briefly show that our algorithms also extend to the case where we can create a reconfigurable link to the central packet switch and also bound the runtime:
For example the original Blossom algorithm [26] can be used compute a maximum weight matching in \(\beta = O(|E ||V|^2)\), but faster maximum weight matching algorithms exist, for which we refer to the comprehensive overview by Duan and Pettie [24, Tbl. III].
4.5 Bounds and Extensions
Given that we provided optimal algorithms for hybrid switch networks above, we now investigate theoretical performance bounds and extensions. As such, we provide bounds on the improvement of the load after reconfiguration, prove that maximum matching algorithms do not perform well in terms of competitive analysis, and show how our algorithms can be extended to multiple small reconfigurable switches.
Improvement bounds. If the capacities of reconfigurable links are arbitrarily large, in comparison to the static links, then the maximum load after applying reconfiguration can become arbitrarily small, under selected scenarios. Thus, to understand the intrinsic lower bounds of the reconfiguration problem on hybrid switch networks \(N= (V,E,\mathcal {E},C)\), we investigate the case where the capacity function C is uniform, denoted by \((V,E,\mathcal {E},1)\).
For a hybrid network N with uniform capacities, the improvement of the load on an arbitrary static link \(\lbrace u,v\rbrace \in E\) relies on the incremental edge-connectivity imposed by the reconfigured links in M between u and v in \(N\left(M\right)\). If a node u has only one static link \(\lbrace u,v\rbrace \in E\), then the edge-connectivity from u to v can be at most two in \(N\left(M\right)\) for any reconfiguration M, which further implies that the load on edges outgoing from u can at best be split along both edges after performing reconfiguration.
Competitivity of matching algorithms. We next investigate the theoretical performance of a maximum matching algorithm, as e.g., utilized in [82]. The idea based on a maximum matching is that for each reconfigurable link \(\lbrace u,v\rbrace \in \mathcal {E}\), we send all flows of demands \(D(u,v)\) and \(D(v,u)\) on links \(\left(u,v\right)\) and \(\left(v,u\right),\) respectively, then to find a maximum matching to maximize total size of flows on a set of configured links M. As it turns out, such an optimization might yield nearly no benefit, even though an optimal algorithm could hit the theoretical lower bound provided in Lemma 4.8.
Extension to smaller reconfigurable circuit switches. In case the number of ports of a single reconfigurable switch does not suffice for all nodes in the network, our algorithms also extend to the case of multiple smaller reconfigurable switches. We can connect subsets of the nodes to a reconfigurable switch each, e.g., grouped by historical data w.r.t. the traffic demands. Our hybrid switch algorithms in Section 4 then take this subset of possible reconfigurable links to work with and proceed as usual, e.g., by assigning non-allowed links a weight (benefit) of 0 in matchings.
4.6 Practical Considerations
For non-blocking6 data-center topologies, where the load-balancing is usually dominated by the last hop, e.g., for incast [69], we can abstract the static topology as a star (tree of depth one) as shown in Figure 2, such that our algorithm can minimize the loads by, e.g., taking away elephant flow from the original static network to high-capacity reconfigurable links. Our solution provides an efficient and optimal way to design reconfigurable networks for existing DCNs to optimize load-balancing, which significantly outperforms conventional methods of implementing reconfigurable links by a maximum weighted matching, e.g., [29, 82], and by a greedy approach, e.g., [44, 91], as we will show in the next section in practical evaluations, beyond the previous theoretical results.
Our solution can be implemented directly and is generally compatible with pre-installed routing configurations of existing data-centers, as it relies on analyzing the matrix of traffic demands to determine which pairs of intensive-communication nodes to be transferred to reconfigurable links. More specifically, after preprocessing on demands, elephant flows can be separately sent on reconfigurable links, and other remaining demands will be still routed through the static network as before, e.g., ECMP, packet-based routing, flowlet-based routing, and so on can be applied in these settings for the remaining flows.
For real-time applications, the reconfiguration delay (time) that reconfigurable links cannot transfer data during their establishing phase might degenerate the performance when traffic pattern changes very significantly in a short interval. However, in general, data-center traffic patterns feature significant temporal locality,7 and most transmitted bytes belong to big and more long-lasting elephant flows, which have a large transmission time compared to the reconfiguration time. For example, Roy et al. [70] observed 90% bytes flow in elephant flows, and Griner et al. [41] give examples where a 500 MB flow, whose transmission time is 100 ms, with the reconfiguration time being 15 ms, while many other empirical studies show similar results, e.g., Mellette et al. [61], Venkatakrishnan et al. [80]. Based on these practical observations, we introduce a factor \(\theta \in [0,1]\) to indicate the ratio of reconfiguration time to the interval of a demand in our evaluations and we broadly discuss the results for \(\theta =0\) and \(\theta =0.05\) respectively in Section 5, which reveals the robustness of our algorithm under the interference of reconfiguration time.
Notwithstanding, in general, the problem of how to deal with the non-availability of optical links during reconfiguration is still an open research problem, as discussed by Nance Hall et al. [43, Section 6]: “Ideally, we want a reconfigurable link to exist beforethe traffic appears”, with the additional challenge of these changes being consistent [35, 49].
5 Evaluations
In order to study the performance of our algorithms under realistic workloads, we conducted extensive experiments with a simulator, which we will release together with this article (as open source code). In particular, we benchmark our hybrid switch algorithms against several state-of-the-art maximum matching and greedy baselines, considering a spectrum of packet traces on hybrid switch topologies as in Figure 2. We first describe our methodology in Section 5.1 and then discuss our results in Section 5.2. To facilitate reproducibility, our source code is available at https://gitlab.cs.univie.ac.at/ct-papers/2021-tompecs-load-optimization.
5.1 Methodology
Comparison with related work. We consider the following approaches from related work, used in multiple state-of-the-art articles [43], as described next, and implemented the corresponding algorithms for comparison.
—
First, we compare our hybrid switch network algorithms (denoted by HSN-US/SN) with a Maximum Weight Matching algorithm as a baseline, where routing occurs either on direct reconfigurable links or via the central packet switch. The matching algorithm is employed by many state-of-the-art systems [61, Table 1], also recently e.g., in Chopin [71]. Its use was spearheaded by Helios and c-Through [29, 82] and it is also optimal w.r.t. the average weighted path length [33] in such a routing model.8
—
Second, we also compare to a Greedy approach used by, e.g., Halperin et al. [44] and Zheng et al. [91]. For the link e that currently has the highest load, we check for the largest flow that can be rerouted on a direct connection, and offload it from the electrically switched network parts. This process is iterated until the load cannot be reduced further, where different links e can be chosen in each iteration.
Hence, in the following plots, the approaches that correspond to related work are labeled as Max Weight Matching and Greedy, respectively. Lastly, we additionally plot the maximum load on the network before any reconfiguration was applied (labeled as Oblivious).
Traffic workloads. It is known that traffic traces in different networks and running different applications can differ significantly [7, 13, 37, 51, 70]. Thus, we collected a number of real-world and synthetic datasets from which we generate traffic matrices to evaluate and compare the performance of our algorithms. In particular:
—
Data center traces: We consider two data center workloads, based on traces made available by Facebook [28, 70, 90]. The first workload features traces from a cluster running the batch-processing application Hadoop. The second one consists of traces from a cluster running SQL databases. Both workloads differ heavily in their communication patterns and the overall network load. Hence, the structural and temporal patterns of the workloads are quite different [7].
—
HPC traces: We further consider a high performance computing workload, obtained from the CESAR backbone [2] The workload consists of a collection of MPI traffic, which was collected while running the application Nekbone. The application solves poison equations using the conjugate gradient method.
—
Synthetic traces: The synthetic pFabric traces are frequently considered as benchmarks in scientific evaluations [6]. In a nutshell, workloads arrive according to a Poisson process, are embedded in a data center context, and follow a random communication pattern between subsets of nodes. In order to generate traffic traces and produce demand matrices, we use the NS2 simulation script we obtained from the authors, using the parameter \(p=0.5\).
In more detail, for each simulation setting, e.g., \(100-\hbox{1,000}\) or \(\hbox{1,000}-\hbox{3,000}\) nodes, we pre-fetch a sequence of requests and keep it in memory. For example, to observe 3,000 distinct nodes in the case of Facebook’s data center traffic, we have to fetch a much larger traffic sequence, than in the case of 1,000 distinct nodes. Furthermore, to ensure fairness, the fetched traffic sequence does not stop at the last node discovered, but rather goes slightly beyond that, to allow the last discovered node to eventually be observed a few times in subsequent requests. Subsequently, depending on the current amount of nodes n, we only use the requests from the fetched sequence, where traffic occurs between those n nodes. Hence, the computational workload for, e.g., 1,000 nodes is higher in the setting of 1,000\(-\)3,000 nodes in comparison to the setting of 1,000\(-\)3,000 nodes.
Reconfiguration delay. In order to model the reconfiguration delays of optical circuit switches, our approach is the following. We account for the reconfiguration delay by introducing a penalty parameter \(\theta \in [0,1]\), which denotes the percentage of time per traffic sequence a switch needs for reconfiguration. We first compute the optimized load of the network as if no reconfiguration delay applies. Then, we query the network for the optical link load and redistribute \((load * \theta)\) amount of bytes from the optical link to the electrical links. Finally, we query the network again for the maximum load.
Experimental setup. All considered topologies, ranging from 40 to 3,000 nodes,9 employ hybrid switch networks as in Figure 2(b).10 We repeat each setting by running it 5 times and display the averaged results, normalizing the workload in the static topology. For the runtime, we also display the averaged results, normalizing them against the results of our HSN-SN algorithm.
Our simulations were run on a machine with two Intel Xeons E5-2697V3 SR1XF with 2.6 GHz, 14 cores11 each and a total of 128 GB RAM. The host machine was running Ubuntu 18.04.3 LTS.
We implemented the algorithms in Python (3.7.3) leveraging the NetworkX library (2.3). For the implementation of the maximum matching algorithm we used the algorithm provided by NetworkX.
5.2 Results and Discussion
We report on the main results obtained in our simulations based on the different datasets. Figures 4 and 6 summarize our evaluation results in terms of load and runtime for the Facebook traces; Figure 5 shows the corresponding results for the HPC and pFabric traces.
Fig. 4.
Fig. 5.
Potential for load optimization. All algorithms significantly improve the load over the Oblivious baseline and provide relatively stable benefits throughout all scenarios investigated.
We evaluate all algorithms with and without a reconfiguration delay, where the dashed lines in the maximum load plots correspond to the results achieved with a reconfiguration delay applied. The reconfiguration delay penalty \(\theta\) is set to 0.05 in all experiments. Hence, \(5\%\) of the load on an optical link is redistributed to the electrical links to account for the reconfiguration delay.
Among these algorithms, the HSN algorithms typically clearly outperform the others.
More specifically, for the database (Figures 4(a) and 6(a)) clusters, the reduction in the maximum load provided by the HSN-SN algorithm is almost a factor of two throughout the spectrum.
Fig. 6.
For the Hadoop clusters (Figures 4(c) and 6(b)), the performance of HSN-SN slightly decreases, but still achieves \(\approx 60\%\) of the original Oblivious load up until a network size of 1,000 and then stays stable at \(\approx 70\%\) beyond. The three remaining algorithms (Greedy, Max. Weight Matching, and our HSN-US) achieve nearly identical values, with Greedy and HSN-US being slightly better. Above 1,000 nodes, we can observe that their capability to further reduce the load seems to be quite restricted. In some Hadoop workload instances, Max. Weight Matching achieves no or only minimal load reduction results, matching up Lemma 4.9 to practice. Notwithstanding, they always perform significantly worse than HSN-SN, resulting in a comparatively load-increase of \(\approx 60\%\).
Regarding the HPC traces, we can observe similar results as in the Database Cluster, in terms of maximum load reduction. Also for the pFabric traces, our HSN-US algorithm achieves a lower maximum load compared to the Greedy or Max. Weight Matching. Here, the variance is slightly higher than in the other experiments; this matches empirical observations on the complexity of the traces produced by these synthetic traces [7].
In regard to the maximum load reduction, we conclude that our HSN-SN algorithm is quite stable w.r.t. to the number of nodes in the network. In contrast to that, Max. Weight Matching and the Greedy algorithm asymptotically approach the maximum load of the unconfigured network.
With respect to the results achieved while using the reconfiguration delay penalty \(\theta\), we can observe that the maximum link load is slightly higher for all algorithms. However, the simulations show that the reconfiguration delay penalty has a larger impact on our HSN-SN algorithm. The reason for this is that the HSN-SN algorithm is capable of distributing the traffic load more equally between the optical and electrical links. Therefore, redistributing \(5\%\) (\(\theta = 0.05\)) of the load from the optical links to the electrical links results in an approximately \(5\%\) increase of the load on the electrical link, which then carries the maximum load. Compared to that, the other algorithms fail to offload a significant amount of traffic to the optical links. Hence, the reconfiguration delay has a minor influence on the maximum link load because the electrical links already carry the vast amount of traffic.
Runtime performance. The best runtime is generally achieved by the Greedy algorithm, due to its early termination when no link can be added anymore. Our experiments show that in the case of the Greedy algorithm, this is unfortunately happening very early on. Regarding the runtime of the Max. Weight Matching, we want to emphasize that the algorithm is unaware of the underlying problem of reducing the maximum link load. Therefore, a lot of runtime is actually wasted without achieving any further load reduction. Hence, in some cases, e.g., in the larger Facebook clusters, Max. Weight Matching is even slower than HSN-SN. In comparison to Max. Weight Matching, our HSN-US has a similar runtime, while spending all of it searching for the best load reduction matching.
HSN-US is consistently faster than HSN-SN, and the latter features quite a high variance in runtime. Notwithstanding, HSN-US has the benefit of only routing along single paths, which can be beneficial for performance metrics beyond load [72, 87]. On the other hand, such issues can also be alleviated with specialised multipath procotols [23, 68, 83]. Still, in some cases and specific workloads, the routing of related demands becomes easier in the SN model. Hence HSN-SN can even be slightly faster than HSN-US, such as for the Hadoop cluster at 3,000 nodes, due to the fact that the underlying matching problem is identical for both HSN-US/-SN.
Summary. While all algorithms provide load reductions, the extent of these optimizations and the required runtime differ significantly. Our results suggest that the load optimizations provided by HSN-US might prove beneficial over other segregated routing strategies, particularly because of its low runtime which is comparable to that of the Max. Weight Matching. We conclude that when considering both potential load reduction and runtime, HSN-SN provides a better tradeoff than HSN-US.
6 Related Work
Most related work on flow routing in data center networks focuses on non-reconfigurable topologies [64]. That said, many recent works design and evaluate reconfigurable topologies e.g., [17, 29, 37, 44, 55, 56, 59, 60, 61, 67, 80, 81, 82, 85, 86], often showing significant performance gains over static topologies and proving real-world viability. However, the algorithmic complexity of reconfigurable data center networks is mostly unstudied [34], and many fundamental questions remain open [11].
Scheduling traffic matrices with specific skew were investigated in [56, 57, 67, 80], but performance guarantees were only obtained by Venkatakrishnan et al. [80] due to leveraging submodularity, a condition that does not hold in our setting. Similarly, Avin et al. [8, 9, 10] investigate traffic matrices with low entropy, but they require scalable constant reconfigurable degrees and are oblivious to hybrid networks, as in [16, 65], and thus do not translate to the herein considered model.
The idea of leveraging good connectivity in data center contexts arose from utilizing random graphs [75], and later extended into deterministic versions [22, 52, 77]. Xia et al. [86] used this idea to heuristically switch between random graphs and Clos topologies, depending on the traffic pattern, whereas Mellette et al. [60] incorporate it to improve their Rotornet [61] approach: If a flow cannot be delayed respectively be buffered, it gets sent along a short route. Both works of Mellette et al. also have the benefit that their reconfigurations are oblivious to the current traffic pattern, but hence also depend on the same for the resulting performance.
One of the notable works that does not rely on centralized computation is ProjecToR by Ghobadi et al. [37], which instead performs a distributed matching protocol reminiscent to the idea of stable matchings [1]. In their setting, they obtain a \((2+\varepsilon)\) approximation for the weighted latency objective but do not consider load.
The algorithmic complexity of weighted latency was also considered in [32, 33], where already basic topologies and settings turned out to be intractable. On the other hand, finding a single shortest path in a partially reconfigured network can be done efficiently, and hence yields well performing heuristics [31]. Moreover, some routing models can even be solved optimally. Notwithstanding, it is unclear how to transfer these results to a load-optimization setting: in topologies with unfavorable betweenness centrality, shortest path routing can overload popular links with high load.
Load-optimization in reconfigurable data centers was recently studied by Yang et al. [87], who investigated the impact of wireless interference on cross-layer optimization. Different wireless links are modeled as a conflict graph, where the task is to find sufficiently good independent (link) sets, in order to provide an interference-free reconfiguration. We see our work as orthogonal, as we only consider inherently interference-free technologies, and as thus it would be interesting to leverage their results in future work.
Another interesting line of work is by Zheng et al. [91], who study how to enhance the design of Diamond, BCube, and VL2 network topologies with small reconfigurable switches, inspired by Flat-Tree [86]. They target maximum link load as well, and present intractability results on general graphs, although these results do not transfer to specific data center topologies or trees, respectively. Different routing models are not analyzed. Moreover, they propose to reconfigure the network with a greedy algorithm, which however does not come with formal performance guarantees. In evaluations of small network sizes, their combination of greedy algorithm and enhanced network design reduces the maximum load by \(12\%\) on average. We see similar greedy algorithm behavior in our evaluations, where however the greedy algorithm performance decreases to just a few percent of load improvement as the network size grows.
That being said, even though our work is mostly motivated by technologies emerging in data center networks, it also applies to other reconfigurable technologies, as long as they fulfill our model properties. Fundamentally different however are reconfigurable optical wide-area networks, as therein the fiber connectivity is fixed. Hence capacities can be adjusted and alternative failover paths provided, leading to improvements in the scheduling of bulk-transfers [21, 48, 49, 58] and reliability concerns [39, 42, 74, 92].
7 Conclusion
We investigated load minimization in reconfigurable hybrid networks, leveraging the flexibility of emerging programmable physical layers. To this end, we investigated the underlying problem complexity, unveiling that already tree topologies of small height induce intractability for a multitude of routing models, and that one cannot hope for general approximability via submodularity techniques. Notwithstanding, we showed that hybrid switch networks, and in turn, non-blocking data center interconnects, can be optimized efficiently. Trace-driven simulations show that our hybrid switch algorithms significantly outperform a state-of-the-art maximum matching baseline, but also greedy algorithms.
Footnotes
1
Symmetrical connectivity is the standard industry assumption for static cabling, however for reconfigurable links as well. Outside highly experimental hardware, e.g., [37], off-the-shelf products use full-duplex connections [14, 66] and this model assumption is hence prevalent, even in Free-Space Optics [12] proposals.
2
In other words, no two links in M are adjacent or share an endpoint, enforced by hardware constraints in practice (exclusive connections between ports). We refer to Hecht [47] for an introduction to the technological background.
3
We note that in other works with analogous definitions, load might also be denoted by utilization, and load-optimization by load-balancing.
4
We note that the non-blocking property can also be restricted to keep distributed routing schemes in mind, we refer to Yuan [89] for an in-depth discussion.
5
Recall that the UN model is NP-hard on hybrid switch networks (Section 3.2).
6
If the DC topology is blocking, such as, e.g., DCell, Jellyfish, MDCube etc. [53], i.e., in particular in server-centric proposals, we cannot directly apply our algorithms, unless the topologies are augmented to be non-blocking. An extension of our optimal polynomial-time algorithms to general server-centric topologies is unlikely, as we have shown that already simple topologies beyond stars induce intractability.
7
This is not always the case for wide-area networks [30].
8
Note that a maximum matching algorithm is not optimal regarding path lengths in all topologies. However, when the distances between all nodes are identical in the static network part, a standard maximum matching approach is optimal in hybrid switch networks w.r.t. weighted path length [33].
9
See Alistarh et al. [5] w.r.t. the feasibility of 1,000 port optical switches in data centers.
10
In other words, we assume that the static networks can be abstracted as trees of depth one, due to them being, e.g., non-blocking, such as for fat-trees or Clos topologies in general.
11
However, each algorithm only utilized a single core.
Dan Alistarh, Hitesh Ballani, Paolo Costa, Adam C. Funnell, Joshua Benjamin, Philip M. Watts, and Benn Thomsen. 2015. A high-radix, low-latency optical switch for data centers. Computer Communication Review 45, 5 (2015), 367–368.
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal near-optimal datacenter transport. In Proceedings of the SIGCOMM. ACM.
Chen Avin, Manya Ghobadi, Chen Griner, and Stefan Schmid. 2020. On the complexity of traffic traces and implications. In Proceedings of the ACM SIGMETRICS.
Chen Avin, Alexandr Hercules, Andreas Loukas, and Stefan Schmid. 2018. rDAN: Toward robust demand-aware network designs. Information Processing Letters 133 (2018), 5–9.
Chen Avin, Kaushik Mondal, and Stefan Schmid. 2019. Demand-aware network design with minimal congestion and route lengths. In Proceedings of the INFOCOM. IEEE.
Chen Avin and Stefan Schmid. 2018. Toward demand-aware networking: A theory for self-adjusting networks. Computer Communication Review 48, 5 (2018), 31–40.
Navid Hamed Azimi, Zafar Ayyub Qazi, Himanshu Gupta, Vyas Sekar, Samir R. Das, Jon P. Longtin, Himanshu Shah, and Ashish Tanwer. 2014. FireFly: A reconfigurable wireless data center fabric using free-space optics. In Proceedings of the SIGCOMM. ACM.
Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network traffic characteristics of data centers in the wild. In Proceedings of the Internet Measurement Conference. ACM.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo, Guohan Lu, Lihua Yuan, Yixin Zheng, Haitao Wu, Yongqiang Xiong, and David A. Maltz. 2013. Per-packet load-balanced, low-latency routing for clos-based data center networks. In Proceedings of the CoNEXT. ACM.
Esra Ceylan, Klaus-Tycho Foerster, Stefan Schmid, and Katsiaryna Zaitsava. 2021. Demand-aware plane spanners of bounded degree. In Proceedings of the Networking. IEEE, 1–9.
Kai Chen, Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, Yueping Zhang, Xitao Wen, and Yan Chen. 2014. OSA: An optical switching architecture for data center networks with unprecedented flexibility. IEEE/ACM Transactions on Networking 22, 2 (2014), 498–511.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd. ed.). The MIT Press.
Wenkai Dai, Michael Dinitz, Klaus-Tycho Foerster, and Stefan Schmid. 2022. Brief announcement: Minimizing congestion in hybrid demand-aware network topologies. In Proceedings of the 36th International Symposium on Distributed Computing.Christian Scheideler (Ed.), Leibniz International Proceedings in Informatics,Vol. 246, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 42:1–42:3. DOI:
Michael Dinitz and Benjamin Moseley. 2020. Scheduling for weighted flow and completion times in reconfigurable networks. In Proceedings of the INFOCOM. IEEE.
Advait Abhay Dixit, Pawan Prakash, Y. Charlie Hu, and Ramana Rao Kompella. 2013. On the impact of packet spraying in data center networks. In Proceedings of the INFOCOM. IEEE.
Nick G. Duffield, Pawan Goyal, Albert G. Greenberg, Partho Pratim Mishra, K. K. Ramakrishnan, and Jacobus E. van der Merwe. 1999. A flexible model for resource management in virtual private networks. In Proceedings of the SIGCOMM. ACM, 95–108.
Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. 2010. Helios: A hybrid electrical/optical switch architecture for modular data centers. In Proceedings of the SIGCOMM. ACM.
Thomas Fenz, Klaus-Tycho Foerster, and Stefan Schmid. 2021. On efficient oblivious wavelength assignments for programmable wide-area topologies. In Proceedings of the ANCS. ACM, 38–51.
Klaus-Tycho Foerster, Maciej Pacut, and Stefan Schmid. 2019. On the complexity of non-segregated routing in reconfigurable data center architectures. Computer Communication Review 49, 2 (2019), 2–8.
Klaus-T. Foerster, Manya Ghobadi, and Stefan Schmid. 2018. Characterizing the algorithmic complexity of reconfigurable data center architectures. In Proceedings of the ANCS. ACM.
Klaus-T. Foerster and Stefan Schmid. 2019. Survey of reconfigurable data center networks: Enablers, algorithms, complexity. SIGACT News 50, 2 (2019), 62–79.
Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil R. Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre-Alexandre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel C. Kilper. 2016. ProjecToR: Agile reconfigurable data center interconnect. In Proceedings of the SIGCOMM. ACM.
Michel X. Goemans, Nicholas J. A. Harvey, Satoru Iwata, and Vahab Mirrokni. 2009. Approximating submodular functions everywhere. In Proceedings of the SODA.
Jennifer Gossels, Gagan Choudhury, and Jennifer Rexford. 2019. Robust network design for IP/optical backbones. Journalof Optical Communications and Networking 11, 8 (2019), 478–490.
Albert G. Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2011. VL2: A scalable and flexible data center network. Communications of the ACM 54, 3 (2011), 95–104.
Chen Griner, Johannes Zerwas, Andreas Blenk, Manya Ghobadi, Stefan Schmid, and Chen Avin. 2021. Cerberus: The power of choices in datacenter topology design - A throughput perspective. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, 3 (2021), 33 pages. DOI:
Matthew Nance Hall, Paul Barford, Klaus-Tycho Foerster, Manya Ghobadi, William Jensen, and Ramakrishnan Durairajan. 2021. Are WANs ready for optical topology programming?. In Proceedings of the OptSys@SIGCOMM. ACM, 28–33.
Matthew Nance Hall, Klaus-Tycho Foerster, Stefan Schmid, and Ramakrishnan Durairajan. 2021. A survey of reconfigurable optical networks. Optical Switching and Networking 41 (2021), 100621.
Daniel Halperin, Srikanth Kandula, Jitendra Padhye, Paramvir Bahl, and David Wetherall. 2011. Augmenting data center networks with multi-gigabit wireless links. In Proceedings of the SIGCOMM. ACM.
Abdelbaset S. Hamza, Jitender S. Deogun, and Dennis R. Alexander. 2016. Wireless communication in data centers: A survey. IEEE Communications Surveys and Tutorials 18, 3 (2016), 1572–1595.
Su Jia, Xin Jin, Golnaz Ghasemiesfeh, Jiaxin Ding, and Jie Gao. 2017. Competitive analysis for online scheduling in software-defined optical WAN. In Proceedings of the INFOCOM. IEEE.
Xin Jin, Yiran Li, Da Wei, Siming Li, Jie Gao, Lei Xu, Guangzhi Li, Wei Xu, and Jennifer Rexford. 2016. Optimizing bulk transfers with software-defined optical WAN. In Proceedings of the SIGCOMM. ACM, 87–100.
Christoforos Kachris and Ioannis Tomkos. 2012. A survey on optical interconnects for data centers. IEEECommunications Surveys and Tutorials 14, 4 (2012), 1021–1036.
Srikanth Kandula, Sudipta Sengupta, Albert G. Greenberg, Parveen Patel, and Ronnie Chaiken. 2009. The nature of data center traffic: Measurements and analysis. In Proceedings of the Internet Measurement Conference. ACM.
Simon Kassing, Asaf Valadarsky, Gal Shahaf, Michael Schapira, and Ankit Singla. 2017. Beyond fat-trees without antennae, mirrors, and disco-balls. In Proceedings of the SIGCOMM. ACM.
Brian Lebiednik, Aman Mangal, and Niharika Tiwari. 2016. A survey and evaluation of data center network topologies. CoRR abs/1605.01701 (2016). Retrieved from http://arxiv.org/abs/1605.01701
He Liu, Feng Lu, Alex Forencich, Rishi Kapoor, Malveeka Tewari, Geoffrey M. Voelker, George Papen, Alex C. Snoeren, and George Porter. 2014. Circuit switching under the radar with REACToR. In Proceedings of the NSDI. USENIX Association.
He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan, Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, and Alex C. Snoeren. 2015. Scheduling techniques for hybrid circuit/packet networks. In Proceedings of the CoNEXT. ACM.
Long Luo, Klaus-Tycho Foerster, Stefan Schmid, and Hongfang Yu. 2020. Deadline-aware multicast transfers in software-defined optical wide-area networks. IEEEJournal on Selected Areas in Communications 38, 7 (2020), 1584–1599.
Long Luo, Klaus-Tycho Foerster, Stefan Schmid, and Hongfang Yu. 2020. SplitCast: Optimizing multicast flows in reconfigurable datacenter networks. In Proceedings of the INFOCOM. IEEE.
William M. Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C. Snoeren, and George Porter. 2020. Expanding across time to deliver bandwidth efficiency and low latency. In Proceedings of the NSDI. https://dblp.org/rec/conf/nsdi/MelletteDGMSP20.html?view=bibtex.
William M. Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C. Snoeren, and George Porter. 2017. RotorNet: A scalable, low-complexity, optical datacenter network. In Proceedings of the SIGCOMM. ACM.
William M. Mellette, Alex C. Snoeren, and George Porter. 2016. P-FatTree: A multi-channel datacenter network topology. In Proceedings of the HotNets. ACM.
Mohammad Noormohammadpour and Cauligi S. Raghavendra. 2018. Datacenter traffic control: Understanding techniques and tradeoffs. IEEE Communications Surveys and Tutorials 20, 2 (2018), 1492–1525.
Maciej Pacut, Wenkai Dai, Alexandre Labbe, Klaus-Tycho Foerster, and Stefan Schmid. 2021. Improved scalability of demand-aware datacenter topologies with minimal route lengths and congestion. Performance Evaluation 152 (2021), 102238.
George Porter, Richard D. Strong, Nathan Farrington, Alex Forencich, Pang-Chen Sun, Tajana Rosing, Yeshaiahu Fainman, George Papen, and Amin Vahdat. 2013. Integrating microsecond circuit switching into the data center. In Proceedings of the SIGCOMM. ACM.
Costin Raiciu, Sébastien Barré, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. 2011. Improving datacenter performance and robustness with multipath TCP. In Proceedings of the SIGCOMM. ACM.
Yongmao Ren, Yu Zhao, Pei Liu, Ke Dou, and Jun Li. 2014. A survey on TCP Incast in data center networks. International Journal of Communication Systems 27, 8 (2014), 1160–1172.
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the social network’s (Datacenter) network. In SIGCOMM, ACM, 123–137.
Neta Rozen-Schiff, Klaus-Tycho Foerster, Stefan Schmid, and David Hay. 2022. Chopin: Combining distributed and centralized schedulers for self-adjusting datacenter networks. In Proceedings of the OPODIS (LIPIcs’22).Schloss Dagstuhl - Leibniz-Zentrum für Informatik.
Siddhartha Sen, David Shue, Sunghwan Ihm, and Michael J. Freedman. 2013. Scalable, optimal flow routing in datacenters via local link balancing. In Proceedings of the CoNEXT. ACM.
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Hong Liu, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2016. Jupiter rising: A decade of clos topologies and centralized control in Google’s datacenter network. Communications of the ACM 59, 9 (2016), 88–97.
Rachee Singh, Manya Ghobadi, Klaus-T. Foerster, Mark Filer, and Phillipa Gill. 2018. RADWAN: Rate adaptive wide area network. In Proceedings of the SIGCOMM. ACM.
Ankit Singla, Chi-Yao Hong, Lucian Popa, and Philip Brighten Godfrey. 2012. Jellyfish: Networking data centers randomly. In Proceedings of the NSDI. USENIX.
Asaf Valadarsky, Gal Shahaf, Michael Dinitz, and Michael Schapira. 2016. Xpander: Towards optimal-performance datacenters. In Proceedings of the CoNEXT. ACM.
Shaileshh Bojja Venkatakrishnan, Mohammad Alizadeh, and Pramod Viswanath. 2016. Costly circuits, submodular schedules and approximate Carathéodory theorems. In Proceedings of the SIGMETRICS. ACM.
Guohui Wang, David G. Andersen, Michael Kaminsky, Michael Kozuch, T. S. Eugene Ng, Konstantina Papagiannaki, Madeleine Glick, and Lily B. Mummert. 2009. Your data center is a router: The case for reconfigurable optical circuit switched paths. In Proceedings of the HotNets. ACM.
Guohui Wang, David G. Andersen, Michael Kaminsky, Konstantina Papagiannaki, T. S. Eugene Ng, Michael Kozuch, and Michael P. Ryan. 2010. c-Through: Part-time optics in data centers. In Proceedings of the SIGCOMM. ACM.
Damon Wischik, Costin Raiciu, Adam Greenhalgh, and Mark Handley. 2011. Design, implementation and evaluation of congestion control for multipath TCP. In Proceedings of the NSDI. USENIX Association.
Wenfeng Xia, Peng Zhao, Yonggang Wen, and Haiyong Xie. 2017. A survey on data center networking (DCN): Infrastructure and operations. IEEE Communications Surveys and Tutorials 19, 1 (2017), 640–656.
Yiting Xia, T. S. Eugene Ng, and Xiaoye Steven Sun. 2015. Blast: Accelerating high-performance data analytics applications by optical multicast. In Proceedings of the INFOCOM.
Yiting Xia, Xiaoye Steven Sun, Simbarashe Dzinamarira, Dingming Wu, Xin Sunny Huang, and T. S. Eugene Ng. 2017. A tale of two topologies: Exploring convertible data center network architectures with flat-tree. In Proceedings of the SIGCOMM. ACM.
Zhenjie Yang, Yong Cui, Shihan Xiao, Xin Wang, Minming Li, Chuming Li, and Yadong Liu. 2019. Achieving efficient routing in reconfigurable DCNs. Proceedings of the ACM on Measurement and Analysis of Computing Systems 3, 3 (2019), 47:1–47:30.
Jiaqi Zheng, Qiming Zheng, Xiaofeng Gao, and Guihai Chen. 2019. Dynamic load balancing in hybrid switching data center networks with converters. In Proceedings of the ICPP. ACM.
Zhizhen Zhong, Manya Ghobadi, Alaa Khaddaj, Jonathan Leach, Yiting Xia, and Ying Zhang. 2021. ARROW: Restoration-aware traffic engineering. In Proceedings of the SIGCOMM. ACM, 560–579.
Zhang SShao JChen BSun WHu W(2024)Interruptible Scheduling of Partially Re-Configurable Optical Switching in Data Center NetworksJournal of Lightwave Technology10.1109/JLT.2023.334104242:7(2212-2224)Online publication date: 1-Apr-2024
By enhancing the traditional static network (e.g., based on electric switches) with a dynamic topology (e.g., based on reconfigurable optical switches), emerging reconfigurable data centers introduce unprecedented flexibilities in how networks can be ...
ACM SIGCOMM '24: Proceedings of the ACM SIGCOMM 2024 Conference
Reconfigurable data center networks (RDCNs) are arising as a promising data center network (DCN) design in the post-Moore's law era. However, the constantly reconfigured network topology in RDCNs invalidates the assumption of using hop count as the cost ...
TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 03
The server-centric data centre network architecture can accommodate a wide variety of network topologies. Newly proposed topologies in this arena often require several rounds of analysis and experimentation in order that they might achieve their full ...
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Zhang SShao JChen BSun WHu W(2024)Interruptible Scheduling of Partially Re-Configurable Optical Switching in Data Center NetworksJournal of Lightwave Technology10.1109/JLT.2023.334104242:7(2212-2224)Online publication date: 1-Apr-2024