Award Winner 2023

Shreshth Tuli photo

AI and Co-Simulation Driven Resource Management in Fog Computing Environments by Shreshth Tuli

Abstract

The title of this thesis, AI and Co-Simulation Driven Resource Management in Fog Computing Environments, covers three main aspects of the work we present here. First, we discuss on what we do, which is making resource management decisions. Second, we discuss where we take such decisions, which are Fog computing environments. Third, we discuss how we take these discussions, that is, through Artificial Intelligence (AI) and Co-Simulation based methods. We consider the Fog computing paradigm, which consists of an infrastructure consisting of heterogeneous distributed computational devices. Fog computing is an emerging paradigm encompassing a diverse spectrum of compute nodes and is considered the future of computing. These may range from resource-abundant cloud virtual machines to resource-limited compute hardware close to the user. As cloud nodes may be at a geographically distant location and a multi-hop distance from the users, they tend to offer high communication latency to users. On the other hand, we have devices at a few hops from the user also referred to as the edge of the network, provide reduced latency but tight compute/memory constraints. In this work, we aim to improve service quality by making intelligent resource management decisions for Fog environments. This is hard with highly dynamic modern applications and volatile resource characteristics of systems. To make intelligent decisions, we consider that a Fog environment consists of broker and worker nodes. The former make resource management decisions, such as when to provision workers, where to place incoming tasks, and how to prevent and recover from failures to ensure service resilience. The latter is where finite-running tasks are executed to return the results to the users. In our setup, we assume that users interact with the Fog environment through gateway devices, such as smartphones or tablets, and send or receive data through sensors and actuators, such as microphones or cameras. The objective is to improve metrics of interest, such as energy consumption, the average response time of tasks, the fraction of violation of task deadlines and the operational cost of the environment. We refer to all these as Quality of Service (QoS) metrics and optimizing such metrics are crucial for both end-users and service providers. In order to make intelligent decisions, we develop novel data-driven strategies. We leverage and improve upon AI-based methods for their speed and accuracy and the ability of such approaches to identify patterns in data that are hard to encode manually. However, AI methods are typically oblivious to the system characteristics. To overcome this, we encode system knowledge in a co-simulator, a digital replica of the physical infrastructure. Such a simulator allows us to quickly generate additional out-of-distribution data for robust model training, run performance tests of decisions of interest from an AI model and perform long-term QoS estimation to eschew myopic decision-making. Such advances allow us to provide significantly higher QoS compared to prior methods for resource management in fog environments.

Full text

Hosted by Imperial College London