-
A Framework for SLO, Carbon, and Wastewater-Aware Sustainable FaaS Cloud Platform Management
Authors:
Sirui Qi,
Hayden Moore,
Ninad Hogade,
Dejan Milojicic,
Cullen Bash,
Sudeep Pasricha
Abstract:
Function-as-a-Service (FaaS) is a growing cloud computing paradigm that is expected to reduce the user cost of service over traditional serverful approaches. However, the environmental impact of FaaS has not received much attention. We investigate FaaS scheduling and scaling from a sustainability perspective in this work. We find that the service-level objectives (SLOs) of FaaS and carbon emission…
▽ More
Function-as-a-Service (FaaS) is a growing cloud computing paradigm that is expected to reduce the user cost of service over traditional serverful approaches. However, the environmental impact of FaaS has not received much attention. We investigate FaaS scheduling and scaling from a sustainability perspective in this work. We find that the service-level objectives (SLOs) of FaaS and carbon emissions conflict with each other. We also find that SLO-focused FaaS scheduling can exacerbate water use in a datacenter. We propose a novel sustainability-focused FaaS scheduling and scaling framework to co-optimize SLO performance, carbon emissions, and wastewater generation.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
CASA: A Framework for SLO and Carbon-Aware Autoscaling and Scheduling in Serverless Cloud Computing
Authors:
S. Qi,
H. Moore,
N. Hogade,
D. Milojicic,
C. Bash,
S. Pasricha
Abstract:
Serverless computing is an emerging cloud computing paradigm that can reduce costs for cloud providers and their customers. However, serverless cloud platforms have stringent performance requirements (due to the need to execute short duration functions in a timely manner) and a growing carbon footprint. Traditional carbon-reducing techniques such as shutting down idle containers can reduce perform…
▽ More
Serverless computing is an emerging cloud computing paradigm that can reduce costs for cloud providers and their customers. However, serverless cloud platforms have stringent performance requirements (due to the need to execute short duration functions in a timely manner) and a growing carbon footprint. Traditional carbon-reducing techniques such as shutting down idle containers can reduce performance by increasing cold-start latencies of containers required in the future. This can cause higher violation rates of service level objectives (SLOs). Conversely, traditional latency-reduction approaches of prewarming containers or keeping them alive when not in use can improve performance but increase the associated carbon footprint of the serverless cluster platform. To strike a balance between sustainability and performance, in this paper, we propose a novel carbon- and SLO-aware framework called CASA to schedule and autoscale containers in a serverless cloud computing cluster. Experimental results indicate that CASA reduces the operational carbon footprint of a FaaS cluster by up to 2.6x while also reducing the SLO violation rate by up to 1.4x compared to the state-of-the-art.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
GreenFaaS: Maximizing Energy Efficiency of HPC Workloads with FaaS
Authors:
Alok Kamatar,
Valerie Hayot-Sasson,
Yadu Babuji,
Andre Bauer,
Gourav Rattihalli,
Ninad Hogade,
Dejan Milojicic,
Kyle Chard,
Ian Foster
Abstract:
Application energy efficiency can be improved by executing each application component on the compute element that consumes the least energy while also satisfying time constraints. In principle, the function as a service (FaaS) paradigm should simplify such optimizations by abstracting away compute location, but existing FaaS systems do not provide for user transparency over application energy cons…
▽ More
Application energy efficiency can be improved by executing each application component on the compute element that consumes the least energy while also satisfying time constraints. In principle, the function as a service (FaaS) paradigm should simplify such optimizations by abstracting away compute location, but existing FaaS systems do not provide for user transparency over application energy consumption or task placement. Here we present GreenFaaS, a novel open source framework that bridges this gap between energy-efficient applications and FaaS platforms. GreenFaaS can be deployed by end users or providers across systems to monitor energy use, provide task-specific feedback, and schedule tasks in an energy-aware manner. We demonstrate that intelligent placement of tasks can both reduce energy consumption and improve performance. For a synthetic workload, GreenFaaS reduces the energy-delay product by 45% compared to alternatives. Furthermore, running a molecular design application through GreenFaaS can reduce energy consumption by 21% and runtime by 63% by better matching tasks with machines.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
MOSAIC: A Multi-Objective Optimization Framework for Sustainable Datacenter Management
Authors:
Sirui Qi,
Dejan Milojicic,
Cullen Bash,
Sudeep Pasricha
Abstract:
In recent years, cloud service providers have been building and hosting datacenters across multiple geographical locations to provide robust services. However, the geographical distribution of datacenters introduces growing pressure to both local and global environments, particularly when it comes to water usage and carbon emissions. Unfortunately, efforts to reduce the environmental impact of suc…
▽ More
In recent years, cloud service providers have been building and hosting datacenters across multiple geographical locations to provide robust services. However, the geographical distribution of datacenters introduces growing pressure to both local and global environments, particularly when it comes to water usage and carbon emissions. Unfortunately, efforts to reduce the environmental impact of such datacenters often lead to an increase in the cost of datacenter operations. To co-optimize the energy cost, carbon emissions, and water footprint of datacenter operation from a global perspective, we propose a novel framework for multi-objective sustainable datacenter management (MOSAIC) that integrates adaptive local search with a collaborative decomposition-based evolutionary algorithm to intelligently manage geographical workload distribution and datacenter operations. Our framework sustainably allocates workloads to datacenters while taking into account multiple geography- and time-based factors including renewable energy sources, variable energy costs, power usage efficiency, carbon factors, and water intensity in energy. Our experimental results show that, compared to the best-known prior work frameworks, MOSAIC can achieve 27.45x speedup and 1.53x improvement in Pareto Hypervolume while reducing the carbon footprint by up to 1.33x, water footprint by up to 3.09x, and energy costs by up to 1.40x. In the simultaneous three-objective co-optimization scenario, MOSAIC achieves a cumulative improvement across all objectives (carbon, water, cost) of up to 4.61x compared to the state-of-the-arts.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
SHIELD: Sustainable Hybrid Evolutionary Learning Framework for Carbon, Wastewater, and Energy-Aware Data Center Management
Authors:
Sirui Qi,
Dejan Milojicic,
Cullen Bash,
Sudeep Pasricha
Abstract:
Today's cloud data centers are often distributed geographically to provide robust data services. But these geo-distributed data centers (GDDCs) have a significant associated environmental impact due to their increasing carbon emissions and water usage, which needs to be curtailed. Moreover, the energy costs of operating these data centers continue to rise. This paper proposes a novel framework to…
▽ More
Today's cloud data centers are often distributed geographically to provide robust data services. But these geo-distributed data centers (GDDCs) have a significant associated environmental impact due to their increasing carbon emissions and water usage, which needs to be curtailed. Moreover, the energy costs of operating these data centers continue to rise. This paper proposes a novel framework to co-optimize carbon emissions, water footprint, and energy costs of GDDCs, using a hybrid workload management framework called SHIELD that integrates machine learning guided local search with a decomposition-based evolutionary algorithm. Our framework considers geographical factors and time-based differences in power generation/use, costs, and environmental impacts to intelligently manage workload distribution across GDDCs and data center operation. Experimental results show that SHIELD can realize 34.4x speedup and 2.1x improvement in Pareto Hypervolume while reducing the carbon footprint by up to 3.7x, water footprint by up to 1.8x, energy costs by up to 1.3x, and a cumulative improvement across all objectives (carbon, water, cost) of up to 4.8x compared to the state-of-the-art.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Predicting the Performance-Cost Trade-off of Applications Across Multiple Systems
Authors:
Amir Nassereldine,
Safaa Diab,
Mohammed Baydoun,
Kenneth Leach,
Maxim Alt,
Dejan Milojicic,
Izzat El Hajj
Abstract:
In modern computing environments, users may have multiple systems accessible to them such as local clusters, private clouds, or public clouds. This abundance of choices makes it difficult for users to select the system and configuration for running an application that best meet their performance and cost objectives. To assist such users, we propose a prediction tool that predicts the full performa…
▽ More
In modern computing environments, users may have multiple systems accessible to them such as local clusters, private clouds, or public clouds. This abundance of choices makes it difficult for users to select the system and configuration for running an application that best meet their performance and cost objectives. To assist such users, we propose a prediction tool that predicts the full performance-cost trade-off space of an application across multiple systems. Our tool runs and profiles a submitted application on a small number of configurations from some of the systems, and uses that information to predict the application's performance on all configurations in all systems. The prediction models are trained offline with data collected from running a large number of applications on a wide variety of configurations. Notable aspects of our tool include: providing different scopes of prediction with varying online profiling requirements, automating the selection of the small number of configurations and systems used for online profiling, performing online profiling using partial runs thereby make predictions for applications without running them to completion, employing a classifier to distinguish applications that scale well from those that scale poorly, and predicting the sensitivity of applications to interference from other users. We evaluate our tool using 69 data analytics and scientific computing benchmarks executing on three different single-node CPU systems with 8-9 configurations each and show that it can achieve low prediction error with modest profiling overhead.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
The Next-Generation OS Process Abstraction
Authors:
Rodrigo Siqueira,
Nelson Lago,
Fabio Kon,
Dejan Milojičić
Abstract:
Operating Systems are built upon a set of abstractions to provide resource management and programming APIs for common functionality, such as synchronization, communication, protection, and I/O. The process abstraction is the bridge across these two aspects; unsurprisingly, research efforts pay particular attention to the process abstraction, aiming at enhancing security, improving performance, and…
▽ More
Operating Systems are built upon a set of abstractions to provide resource management and programming APIs for common functionality, such as synchronization, communication, protection, and I/O. The process abstraction is the bridge across these two aspects; unsurprisingly, research efforts pay particular attention to the process abstraction, aiming at enhancing security, improving performance, and supporting hardware innovations. However, given the intrinsic difficulties to implement modifications at the OS level, recent endeavors have not yet been widely adopted in production-oriented OSes. Still, we believe the current hardware evolution and new application requirements provide favorable conditions to change this trend. This paper evaluates recent research on OS process features identifying potential evolution paths. We derive a set of relevant process characteristics, and propose how to extend them as to benefit OSes and applications.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Farview: Disaggregated Memory with Operator Off-loading for Database Engines
Authors:
Dario Korolija,
Dimitrios Koutsoukos,
Kimberly Keeton,
Konstantin Taranov,
Dejan Milojičić,
Gustavo Alonso
Abstract:
Cloud deployments disaggregate storage from compute, providing more flexibility to both the storage and compute layers. In this paper, we explore disaggregation by taking it one step further and applying it to memory (DRAM). Disaggregated memory uses network attached DRAM as a way to decouple memory from CPU. In the context of databases, such a design offers significant advantages in terms of maki…
▽ More
Cloud deployments disaggregate storage from compute, providing more flexibility to both the storage and compute layers. In this paper, we explore disaggregation by taking it one step further and applying it to memory (DRAM). Disaggregated memory uses network attached DRAM as a way to decouple memory from CPU. In the context of databases, such a design offers significant advantages in terms of making a larger memory capacity available as a central pool to a collection of smaller processing nodes. To explore these possibilities, we have implemented Farview, a disaggregated memory solution for databases, operating as a remote buffer cache with operator offloading capabilities. Farview is implemented as an FPGA-based smart NIC making DRAM available as a disaggregated, network attached memory module capable of performing data processing at line rate over data streams to/from disaggregated memory. Farview supports query offloading using operators such as selection, projection, aggregation, regular expression matching and encryption. In this paper we focus on analytical queries and demonstrate the viability of the idea through an extensive experimental evaluation of Farview under different workloads. Farview is competitive with a local buffer cache solution for all the workloads and outperforms it in a number of cases, proving that a smart disaggregated memory can be a viable alternative for databases deployed in cloud environments.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.
-
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM
Authors:
Aayush Ankit,
Izzat El Hajj,
Sai Rahul Chalamalasetti,
Sapan Agarwal,
Matthew Marinella,
Martin Foltin,
John Paul Strachan,
Dejan Milojicic,
Wen-mei Hwu,
Kaushik Roy
Abstract:
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars a…
▽ More
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our evaluation shows that PANTHER achieves up to $8.02\times$, $54.21\times$, and $103\times$ energy reductions as well as $7.16\times$, $4.02\times$, and $16\times$ execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Cichlid: Explicit physical memory management for large machines
Authors:
Simon Gerber,
Gerd Zellweger,
Reto Achermann,
Moritz Hoffmann,
Kornilios Kourtis,
Timothy Roscoe,
Dejan Milojicic
Abstract:
In this paper, we rethink how an OS supports virtual memory. Classical VM is an opaque abstraction of RAM, backed by demand paging. However, most systems today (from phones to data-centers) do not page, and indeed may require the performance benefits of non-paged physical memory, precise NUMA allocation, etc. Moreover, MMU hardware is now useful for other purposes, such as detecting page access or…
▽ More
In this paper, we rethink how an OS supports virtual memory. Classical VM is an opaque abstraction of RAM, backed by demand paging. However, most systems today (from phones to data-centers) do not page, and indeed may require the performance benefits of non-paged physical memory, precise NUMA allocation, etc. Moreover, MMU hardware is now useful for other purposes, such as detecting page access or providing large page translation. Accordingly, the venerable VM abstraction in OSes like Windows and Linux has acquired a plethora of extra APIs to poke at the policy behind the illusion of a virtual address space.
Instead, we present Cichlid, a memory system which inverts this model. Applications explicitly manage their physical RAM of different types, and directly (though safely) program the translation hardware. Cichlid is implemented in Barrelfish, requires no virtualization support, and outperforms VMM-based approaches for all but the smallest working sets. We show that Cichlid enables use-cases for virtual memory not possible in Linux today, and other use-cases are simple to program and significantly faster.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.
-
A Survey of DevOps Concepts and Challenges
Authors:
Leonardo Leite,
Carla Rocha,
Fabio Kon,
Dejan Milojicic,
Paulo Meirelles
Abstract:
DevOps is a collaborative and multidisciplinary organizational effort to automate continuous delivery of new software updates while guaranteeing their correctness and reliability. The present survey investigates and discusses DevOps challenges from the perspective of engineers, managers, and researchers. We review the literature and develop a DevOps conceptual map, correlating the DevOps automatio…
▽ More
DevOps is a collaborative and multidisciplinary organizational effort to automate continuous delivery of new software updates while guaranteeing their correctness and reliability. The present survey investigates and discusses DevOps challenges from the perspective of engineers, managers, and researchers. We review the literature and develop a DevOps conceptual map, correlating the DevOps automation tools with these concepts. We then discuss their practical implications for engineers, managers, and researchers. Finally, we critically explore some of the most relevant DevOps challenges reported by the literature.
△ Less
Submitted 18 November, 2019; v1 submitted 11 September, 2019;
originally announced September 2019.
-
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
Authors:
Aayush Ankit,
Izzat El Hajj,
Sai Rahul Chalamalasetti,
Geoffrey Ndu,
Martin Foltin,
R. Stanley Williams,
Paolo Faraboschi,
Wen-mei Hwu,
John Paul Strachan,
Kaushik Roy,
Dejan S Milojicic
Abstract:
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications.
We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossba…
▽ More
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications.
We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMA's microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability.
We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores.
We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMA's components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of $577~GOPS/s/mm^2$ and $837~GOPS/s/W$, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to $2,446\times$ energy and $66\times$ latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.
△ Less
Submitted 29 January, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.
-
A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade
Authors:
Rajkumar Buyya,
Satish Narayana Srirama,
Giuliano Casale,
Rodrigo Calheiros,
Yogesh Simmhan,
Blesson Varghese,
Erol Gelenbe,
Bahman Javadi,
Luis Miguel Vaquero,
Marco A. S. Netto,
Adel Nadjaran Toosi,
Maria Alejandra Rodriguez,
Ignacio M. Llorente,
Sabrina De Capitani di Vimercati,
Pierangela Samarati,
Dejan Milojicic,
Carlos Varela,
Rami Bahsoon,
Marcos Dias de Assuncao,
Omer Rana,
Wanlei Zhou,
Hai Jin,
Wolfgang Gentzsch,
Albert Y. Zomaya,
Haiying Shen
Abstract:
The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This…
▽ More
The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.
△ Less
Submitted 24 August, 2018; v1 submitted 24 November, 2017;
originally announced November 2017.
-
Software Platforms for Smart Cities: Concepts, Requirements, Challenges, and a Unified Reference Architecture
Authors:
Eduardo Felipe Zambom Santana,
Ana Paula Chaves,
Marco Aurelio Gerosa,
Fabio Kon,
Dejan Milojicic
Abstract:
Making cities smarter help improve city services and increase citizens' quality of life. Information and communication technologies (ICT) are fundamental for progressing towards smarter city environments. Smart City software platforms potentially support the development and integration of Smart City applications. However, the ICT community must overcome current significant technological and scient…
▽ More
Making cities smarter help improve city services and increase citizens' quality of life. Information and communication technologies (ICT) are fundamental for progressing towards smarter city environments. Smart City software platforms potentially support the development and integration of Smart City applications. However, the ICT community must overcome current significant technological and scientific challenges before these platforms can be widely used. This paper surveys the state-of-the-art in software platforms for Smart Cities. We analyzed 23 projects with respect to the most used enabling technologies, as well as functional and non-functional requirements, classifying them into four categories: Cyber-Physical Systems, Internet of Things, Big Data, and Cloud Computing. Based on these results, we derived a reference architecture to guide the development of next-generation software platforms for Smart Cities. Finally, we enumerated the most frequently cited open research challenges, and discussed future opportunities. This survey gives important references for helping application developers, city managers, system operators, end-users, and Smart City researchers to make project, investment, and research decisions.
△ Less
Submitted 23 July, 2017; v1 submitted 26 September, 2016;
originally announced September 2016.
-
Backtracking algorithms for service selection
Authors:
Yanik Ngoko,
Christophe Cérin,
Alfredo Goldman,
Dejan Milojicic
Abstract:
In this paper, we explore the automation of services' compositions. We focus on the service selection problem. In the formulation that we consider, the problem's inputs are constituted by a behavioral composition whose abstract services must be bound to concrete ones. The objective is to find the binding that optimizes the {\it utility} of the composition under some services level agreements. We p…
▽ More
In this paper, we explore the automation of services' compositions. We focus on the service selection problem. In the formulation that we consider, the problem's inputs are constituted by a behavioral composition whose abstract services must be bound to concrete ones. The objective is to find the binding that optimizes the {\it utility} of the composition under some services level agreements. We propose a complete solution. Firstly, we show that the service selection problem can be mapped onto a Constraint Satisfaction Problem (CSP). The benefit of this mapping is that the large know-how in the resolution of the CSP can be used for the service selection problem. Among the existing techniques for solving CSP, we consider the backtracking. Our second contribution is to propose various backtracking-based algorithms for the service selection problem. The proposed variants are inspired by existing heuristics for the CSP. We analyze the runtime gain of our framework over an intuitive resolution based on exhaustive search. Our last contribution is an experimental evaluation in which we demonstrate that there is an effective gain in using backtracking instead of some comparable approaches. The experiments also show that our proposal can be used for finding in real time, optimal solutions on small and medium services' compositions.
△ Less
Submitted 6 February, 2014;
originally announced February 2014.