Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Katherine Barabash

    Katherine Barabash

    Cloud computing is transforming networking landscape over the last few years. The first order of business for major cloud providers today is to attract as many organizations as possible to their own clouds. To that end cloud providers... more
    Cloud computing is transforming networking landscape over the last few years. The first order of business for major cloud providers today is to attract as many organizations as possible to their own clouds. To that end cloud providers offer a new generation of managed network solutions to connect the premises of the enterprises to their clouds. To serve their customers better and to innovate fast, major cloud providers are currently on the route to building their own "private Internets", which are idiosyncratic. On the other hand, customers that do not want to stay locked by vendors and who want flexibility in using best-for-the-task services spanning multiple clouds and, possibly, their own premises, seek for solutions that will provide smart overlay connectivity across clouds. The result of these developments is a multiplication of closed idiosyncratic solutions rather than an open standardized ecosystem. In this editorial note we argue for desirability of such an ecosys...
    This research is done in the context of the SliceNet project [4] that aims to extend 5G infrastructure with cognitive management of cross-domain, cross-layer network slices [1], with emphasis on Quality of Experience (QoE) for vertical... more
    This research is done in the context of the SliceNet project [4] that aims to extend 5G infrastructure with cognitive management of cross-domain, cross-layer network slices [1], with emphasis on Quality of Experience (QoE) for vertical industries. The provisioning of network slices with proper QoE guarantees is seen as one of the key enablers of future 5G-enabled networks. The challenge is to assess the QoE experienced by the vertical application and its users without requiring the applications or the users to measure and report QoE related metrics back to the provider. To address this challenge, we propose a method for deriving application-level QoE from network-level Quality of Service (QoS) measurements, easily accessible by the provider. In particular, we describe a PoC where QoE, perceived by application users, is estimated from low level network monitoring data, by applying cognitive methods. Our main goal is enabling the cloud provider to support the desired E2E QoE-based Service Level Agreements (SLAs), e.g. by monitoring QoS metrics within the provider's domain to optimize resource allocation through provider's actuators. Additional benefit can be achieved by applying the same technique to troubleshoot issues in the provider's infrastructure. In this work, we employed classical statistical methods to assess the relationship between the application-level QoE and the network-level QoS.
    Network slicing is an essential 5G innovation whereby the network is partitioned into logical segments, so that Communication Service Providers (CSPs) can offer differentiated services for verticals and use cases. In many 5G use cases,... more
    Network slicing is an essential 5G innovation whereby the network is partitioned into logical segments, so that Communication Service Providers (CSPs) can offer differentiated services for verticals and use cases. In many 5G use cases, network requirements vary over time and CSPs must dynamically adapt network slices to satisfy the contractual network slice QoS, cooperating and using each others’ resources, e.g. when resources of a single CSP are not sufficient or suitable to maintain all it’s current SLAs. While this need for dynamic cross-CSP cooperation is widely recognized, realization of this need is not yet possible due to gaps both in business processes and in technical capabilities.In this paper, we present a 5GZORRO approach to dynamic cross-CSP slice scaling. Our approach both enables CSPs to collaborate, providing security and trust with smart multi-party contracts, and facilitates thus achieved collaboration to enable resource sharing across multiple administrative domains, either during slice establishment or when already existing slice needs to expand or shrink. Our approach allows automating both business and technical processes involved in dynamic lifecycle management of cross-CSP network slices, following ETSI’s Zero-Touch Network and Service Management (ZSM) closed-loop architecture, and relying on resource-sharing Marketplace, Distributed Ledger (DL), and Operational Data Lake. We show how this approach is realized in truly Cloud Naive way, with Kubernetes as both business and technical cross-domain orchestrator. We then showcase applicability of the proposed solution for dynamic scaling of Content Delivery Network (CDN) service.
    Operating a cloud-scale service is a huge challenge. There are millions of users worldwide and millions of requests per seconds. For example, Amazon's Simple Storage Service (S3) in 2013 contained two trillion objects and its logs... more
    Operating a cloud-scale service is a huge challenge. There are millions of users worldwide and millions of requests per seconds. For example, Amazon's Simple Storage Service (S3) in 2013 contained two trillion objects and its logs contained 1.1 million log lines per second, which are approximately 10 PB of log records per year (see [1]). Cloud scale implies thousands of servers and network elements, and hundreds of services from multiple cross-regional data centers. Cloud service operation data is scattered over various types of semi-structured and unstructured logs (e.g., application, error, debug), telemetry and network data, as well as customer service records. It is therefore extremely difficult for the multiple owners and administrators in such systems, coming from different units of the organization, to follow the possible paths and system alternatives in order to detect problems, solve issues and understand the service operation.
    Network slicing is a fundamental architectural feature of 5G network infrastructure [2], whereby independent endto-end logical networks support a wide spectrum of vertical industries on shared resource, despite their diverging... more
    Network slicing is a fundamental architectural feature of 5G network infrastructure [2], whereby independent endto-end logical networks support a wide spectrum of vertical industries on shared resource, despite their diverging requirements. Network slice management is challenging and complicated, taking into account the various intraand interdomain deployment scenarios, divergent use cases with different requirements, stakeholders of different roles and business models, etc. The SliceNet project [1] aims to extend 5G infrastructure with cognitive management, control, and orchestration of cross-domain/cross-layer slices, to maximize the potential of the infrastructure, with emphasis on Quality of Experience (QoE) for vertical industries. SliceNet management takes a “verticals in the whole loop” approach, integrating the vertical perspective into the slice management process. SliceNet investigates three use cases, Smart City, Smart Grid, and eHealth, each with its distinct QoE require...
    Cloud computing gave rise to a Cloud-native[1] approach for operating application software in the cloud, whereby applications are segmented into micro-services that can be designed and deployed independently of each other. This... more
    Cloud computing gave rise to a Cloud-native[1] approach for operating application software in the cloud, whereby applications are segmented into micro-services that can be designed and deployed independently of each other. This significantly increases application maintainability, reduces time to market, and helps leveraging cloud computing model. On the other hand, this approach increases the system level complexity of the application and poses new challenges, such as how services discover each other, and how application handles individual service upgrades. To support cloud-native paradigm, new development, deployment, and orchestration tools are created. One of such tools is Istio [2] service mesh, built to connect, secure, control, and observe services. While immensely useful to application developers, Istio is an additional layer in cloud compute platform software stack and is thus prone to failure or misuse. In this work, we address the question of how to explore and troubleshoo...
    Garbage Collection (GC) is a process of automatic memory reclamation from the objects that are no longer required by the mutator. The execution time of the application and memory reclaimed by garbage collector are important factors that... more
    Garbage Collection (GC) is a process of automatic memory reclamation from the objects that are no longer required by the mutator. The execution time of the application and memory reclaimed by garbage collector are important factors that influence the selection of a specific garbage collector while selecting a specific one. The current research paper selects a suitable garbage collector on the basis of the above mentioned two factors. In this paper the execution time of SPECjvm2008 benchmarks and memory reclaimed by a garbage collector while executing these application is determined in real JVM. We further propose the optimal values of the above two issues that can be compromised, if the an application is to be executed with the garbage collector.
    Cloud computing technology enables uniform access to shared pools of configurable system resources and higher-level services, rapidly provisioned with minimal management effort. Cloud computing relies on sharing the resources to achieve... more
    Cloud computing technology enables uniform access to shared pools of configurable system resources and higher-level services, rapidly provisioned with minimal management effort. Cloud computing relies on sharing the resources to achieve coherence and economies of scale, through virtualizion. Cloud network, in particular, is virtualized through multiple logical constructs and SW layers, making cloud connectivity complex to configure, debug, and visualize. In this work, we show how to detect cloud network operational issues through monitoring and analytics, using and enhancing open source network analyzer, Skydive [2]. In particular, we focus on Noisy Neighbor Effect, a situation in which a common resource is monopolized by a noisy tenant, resulting in performance degradation experienced by other tenants. Skydive is an open-source network topology and protocol analyzer, capable of discovering and visualizing cloud network topology across its multiple layers, as well as capturing network traffic at programmable granularity, injecting network traffic, and more. Typical Skydive setup consists of multiple Skydive agents installed on various network components and one or more Skydive analyzers deployed on any compute resource in the cloud. Skydive agents discover and report the information to a Skydive analyzer, that stores it over time so it can be consumed via Web UI, command line tools, and REST API, for visualization, exploration, and analytics. In our work we used Skydive to investigate and detect the Noisy Neighbor Effect in Kubernetes (k8s) network. Our setup consisted of a commercial cloud platform, IBM Cloud Private (ICP) [1], running an HTTP server and two HTTP clients constantly sending requests to the server, all 3 are containerized Python applications as shown in Figure 1. We have installed Skydive agents on all the k8s worker nodes. To achieve our goal of detecting anomalous client behavior and creating a visual indication of such anomaly in Skydive UI, we have enhanced Skydive capabilities and contributed our enhancements back to the project by extending the Python REST client library to support traffic injections, and fixing existing bugs in the Skydive system. We used those enhancements to measure Round Trip Time (RTT) between nodes in the cloud network, detect anomalies in RTT measurements and indicate them in Skydive UI, such as the green indication in Figure 1. In this work, we have made the first step towards automatic detection of Noisy Neighbor with Skydive, using simple threshold based approach, in an experimental setup. This work can be extended in a multiple ways - support more generic and realistic multi-tenant setup; employ deeper analyses, e.g. ML and DL, also on historical data; explore additional anomalous cases, beyond the Noisy Neighbor Effect.
    Over the recent years we witness a massive growth of cloud usage, accelerated by new types of 'born-to-the-cloud' workloads. These new types of workloads are increasingly multi-component, dynamic and often present highly intensive... more
    Over the recent years we witness a massive growth of cloud usage, accelerated by new types of 'born-to-the-cloud' workloads. These new types of workloads are increasingly multi-component, dynamic and often present highly intensive communication patterns. Massive innovation of Data Center Network (DCN) technologies is required to support the demand, giving raise to new network topologies, new network control paradigms, and management models. One particularly promising technology candidate for improving the DCN efficiency is Optical Circuit Switching (OCS). Several hybrid solutions combining OCS with the traditional Electronic Packet Switching (EPS) have been proposed [1, 2], aiming to take advantage of the benefits of the OCS technology (e.g., high bandwidth, low latency and power consumption) while leveling out its shortcomings (e.g., slow reconfiguration time, integration with IP fabric). The first comprehensive work advocating OCS for DCN [1] considered HPC workloads with semi-static communication patterns. Follow up works, such as Helios [2], proposed new ways for identifying heavy flows, heuristics for computing the circuits configuration, and control hooks for dispatching the traffic over EPS and OCS paths. In yet newer works, e.g. [3], further advances were made -- supporting richer sets of communication patterns, employing Software Defined Networking (SDN) to steer the traffic and to achieve more reactive control planes in anticipation for faster OCS capabilities, and more. We observe that in hybrid solutions, the basic approach remains the same -- the network is partitioned between the two separate fabrics, one based on OCS and one based on EPS, so that each network flow is handled by one of the fabrics, depending on its properties. In this work, we present a new architecture where optical circuitry does not merely augment the EPS but is properly integrated with it into a coherently managed unified fabric. Our approach is based on understanding that modern workloads impose diverse traffic demands. Specifically, we identify the abundance of few-to-many and many-to-few communication patterns with multiple dynamic hot spots and observe that such traffic is better served by tighter integration of OCS and EPS achieved through introducing composite paths across the OCS-EPS boundaries. As a preliminary proof of concept, we have evaluated our architecture and compared it to the previously proposed hybrid solutions, considering the known uniform and skewed, as well as few-to-many and many-to-few demand models. For each traffic pattern, we evaluate both whether it can be met by each of the solutions and, if yes, the resulting link utilization. Our preliminary results show a significant improvement in both these metrics -- the feasibility and the link utilization. Looking forward, we plan to expand this research and explore a new thread of opportunities for leveraging the reconfiguration capabilities of contemporary OCS, posing it as a viable DCN technology. This research is partially supported by the European Communitys Seventh Framework Programme (FP7/2001-2013) under grant agreement no. 619572 (COSIGN Project).
    Seamless cloud interoperability is highly desired but not yet easily attainable in the current cloud solutions market. This work tackles one aspect of achieving cloud interoperability, namely, inter-cloud networking. We list the... more
    Seamless cloud interoperability is highly desired but not yet easily attainable in the current cloud solutions market. This work tackles one aspect of achieving cloud interoperability, namely, inter-cloud networking. We list the requirements and propose an inter-cloud networking architecture for a case of independent clouds owned by different entities and powered by different cloud management and network virtualization technologies. Then we validate the proposed architecture by describing an example of working implementation for Open Stack cloud powered by Open Daylight Open DOVE SDN solution. Finally, we compare our architecture to the existing solutions.
    Overlay network virtualization quickly gains traction in today's multi-tenant data centers due to its ability to provide independent virtual networks, at scale, along with complete isolation from the underlying physical network.... more
    Overlay network virtualization quickly gains traction in today's multi-tenant data centers due to its ability to provide independent virtual networks, at scale, along with complete isolation from the underlying physical network. Despite the benefits, performance degradation due to the imposed perpacket encapsulation overhead is a serious impediment. Mitigation approaches are mostly hardware based and thus depend on costly networking gear upgrades and suffer from lesser flexibility and longer times to market, compared to software solutions. Software optimizations proposed so far are limited in scope, applicability, and interoperability. In this paper we present NoEncap, a software-only opt mization, capable of eliminating almost completely the overheads, while fully preserving the benefits of an overlay-based network virtualization.
    Research Interests:
    Java uses garbage collection (GC) for the automatic reclamation of computer memory no longer required by a running application. GC implementations for Java Virtual Machines (JVM) are typically designed for single processor machines, and... more
    Java uses garbage collection (GC) for the automatic reclamation of computer memory no longer required by a running application. GC implementations for Java Virtual Machines (JVM) are typically designed for single processor machines, and do not necessarily perform well for a server program with many threads running on a multiprocessor. We designed and implemented an on-the-fly GC, based on the algorithm of Doligez, Leroy and Gonthier [13, 12] (DLG), for Java in this environment. An on-the-fly collector , a collector that does not stop the program threads, allows all processors to be utilized during collection and provides uniform response times. We extended and adapted DLG for Java (e.g., adding support for weak references) and for modern multiprocessors without sequential consistency, and added performance improvements (e.g., to keep track of the objects remaining to be traced). We compared the performance of our implementation with stop-the-world mark-sweep GC. Our measurements sho...
    The mostly concurrent garbage collection was presented in the seminal paper of Boehm et al. With the deployment of Java as a portable, secure and concurrent programming language, the mostly concurrent garbage collector turned out to be an... more
    The mostly concurrent garbage collection was presented in the seminal paper of Boehm et al. With the deployment of Java as a portable, secure and concurrent programming language, the mostly concurrent garbage collector turned out to be an excellent solution for Java's garbage collection task. The use of this collector is reported for several modern production Java Virtual Machines and it has been investigated further in academia.In this paper, we present a modification of the mostly concurrent collector, which improves the throughput, the memory footprint, and the cache behavior of the collector without foiling the other good qualities (such as short pauses and high scalability). We implemented our solution on the IBM production JVM and obtained a performance improvement of up to 26.7%, a reduction in the heap consumption by up to 13.4%, and no substantial change in the (short) pause times. The modified algorithm was subsequently incorporated into the IBM production JVM.
    Research Interests:
    Research Interests:
    Multithreaded applications with multigigabyte heaps running on modern servers provide new challenges for garbage collection (GC). The challenges for “server-oriented” GC include: ensuring short pause times on a multigigabyte heap while... more
    Multithreaded applications with multigigabyte heaps running on modern servers provide new challenges for garbage collection (GC). The challenges for “server-oriented” GC include: ensuring short pause times on a multigigabyte heap while minimizing throughput penalty, good scaling on multiprocessor hardware, and keeping the number of expensive multicycle fence instructions required by weak ordering to a minimum.We designed and implemented a collector facing these demands building on the mostly concurrent garbage collector proposed by Boehm et al. [1991]. Our collector incorporates new ideas into the original collector. We make it parallel and incremental; we employ concurrent low-priority background GC threads to take advantage of processor idle time; we propose novel algorithmic improvements to the basic mostly concurrent algorithm improving its efficiency and shortening its pause times; and finally, we use advanced techniques, such as a low-overhead work packet mechanism to enable f...