Keywords

1 Introduction

The edge computing paradigm is emerging as a high performance computing environment with a large-scale, heterogeneous collection of autonomous systems and flexible computational architecture [1,2,3,4,5,6]. It provides the tools and technologies to build data or computational intensive parallel applications with much more affordable prices compared to traditional parallel computing techniques. Hence, there has been an increasingly growth in the number of active research work in edge computing such as scheduling, placement, energy management, privacy and policy, security, etc. Workflow scheduling in cloud and edge environment has recently drawn enormous attention thanks to its wide application in both scientific and economic areas. A workflow is usually formulized as a Directed-Acyclic-Graph (DAG) with several n tasks that satisfy the precedent constraints. Scheduling workflows over an edge environment is referred to as matching tasks onto edge services created on edge nodes. For multi-objective scheduling, objectives can sometimes be conflicting. e.g., for execution time minimization, fast services are more preferable than slow ones. However, fast services are usually more expensive and thus execution time minimization may contradict the cost reduction objective. It is widely acknowledged as well that to scheduling multi-task workflow on distributed platforms is an NP-hard problem. It is therefore extremely time-consuming to yield optimal schedules through traversal-based algorithms. Fortunately, heuristic and meta-heuristic algorithms with polynomial complexity are able to produce approximate or near optimal solutions of schedules at the cost of acceptable optimality loss [7,8,9,10,11,12]. Good examples of such algorithms are multi-objective particle swarm optimization (MOPSO) and non-dominated sorting genetic algorithm-II (NSGA-II).

Recently, as novel machine learning algorithms are becoming increasingly versatile and powerful, considerable research efforts are paid to using reinforcement learning (RL) and Q-learning-based algorithms [13,14,15] in finding near-optimal workflow scheduling solutions. Nevertheless, most existing contributions in this direction focused on scheduling workflows over centralized clouds. How to apply Q-learning-based algorithms and models to the problem of scheduling workflows upon distributed edge computing platforms is still to be clearly addressed. In this work, we propose a DQN-based multi-workflow scheduling method. The proposed model takes the probability mass functions (PMF) of historical performance data of edge services as the inputs and is capable of improving the scheduling plans via optimizing the probability of a workflow satisfying the completion-time constraint. We conduct a simulated experiment and compare our method with other baseline algorithms. The results show that our method outperforms baseline algorithms in terms of workflow completion time.

2 Related Work

Scheduling multi-workflows upon distributed infrastructures, e.g., grids, clouds and edge, is usually known to be NP-hard and thus traversal-based algorithms can be ineffective in terms of computational complexity. Instead, heuristic and meta-heuristic procedures with polynomial complexity can yield high-quality and sub-optimal solutions at the cost of a certain level of optimality degradation. For example, [16] leveraged a multi-objective bio-inspired procedure (MOBFOA) by augmenting the tradtional BFOA with Pareto-optimal fronts. Their method deals with the reduction of flow-time, completion duration, and operational cost. [17] considered a multi-objective genetic optimization (BOGA) and optimizd both electricity consumption and DAG reliability. [18] considered an augmented GA with the Efficient Tune-In (GA-ETI) mechanism for the optimization of turnaround time. [19] employed a non-dominated-sorting-based Hybrid PSO approach and aimed at minimizing both turnaround time and cost. [20] introduced a fuzzy dominance sort based heterogeneous finishing time minimization approach for the optimization of both cost and turnaround time of DAG executed on IaaS clouds.

Recently, deep reinforcement learning (DRL) methods shed new light on the problem we are interested in [21,22,23,24,25,26,27]. It was shown that the multi-agent training methods can be effective in dealing with multi-constraint and multi-objective optimization problems. For example, [28] employed a sequential cooperative game approach for heterogeneous workflow scheduling. [29] developed a reinforcement-learning-based method for multi-DAG execution with user-defined priorities specified at different times. [30] proposed a distributed load management and access control approach for the SaaS environment by using a fuzzy game-theoretic model. [31] proposed modified Q-learning method for turn-around time reduction and load balancing by considering weighted fitness value function. However, Q-learning-based algorithms and models intended for edge-infrastructure-based workflow scheduling is very rare. A highly efficient reinforcement-learning-based approach for scheduling and managing multi-workflows upon distributed, mobile, and resource-constrained edge services is in high need.

Fig. 1.
figure 1

Edge computing environment

3 Model and System

3.1 System Architecture

As shown in Fig. 1, an edge computing environment can be seen as a collection of multiple edge servers usually deployed near base stations. By this way, users are allowed to off-load compute-intensive and latency-sensitive applications, e.g., Augmented Reality (AR), Virtual Reality (VR), Artificial Intelligence (AI), to edge servers. Within an edge computing environment, there exist n users in an edge computing environment, denote by \(U=\{u_{1},u_{2},...,u_{n}\}\), and m base stations, denote by \(B=\{b_{1},b_{2},...,b_{m}\}\). Each user has an application to be executed, and users mobile device is allowed to offload tasks on edge servers near the base station by wireless access point. For generality, we regard mobile applications as workflows, denote by a directed acyclic graph(DAG) \(W=(T,D)\), where \(T=\{t_{1},t_{2},...,t_{n}\}\) represents a set of tasks. Tasks have multiple types which have different size of input data. \(D=\{d_{i,j} | i,j\in [1,n]\}\) represents a set of precedence dependencies, where \(d_{i,j}=1\) means \(t_j\) can be executed only when \(t_i\) is completed, otherwise \(d_{i,j}=0\). \(S_{i}=\{s_{1},s_{2},...,s_{n}\}\) represents the list of servers which signal coverage covers user i, thus user i can offload tasks on these servers.

Users are allowed to offload tasks to the edge via wireless access points. The action profile of users can be expressed as \(a_{i}=\{s_{1},s_{2},...,s_{m}\}\), where \(s_{j}\) indicates server \(s_{j}\). For a server \(s_{j}\), a list of users who offload tasks to it can be represented as \(UL_{j}=\{i|s_{j}\in a_{i}\}\). For an action profile \(A=\{a_{1},a_{2},...,a_{n}\}\) of all users, the uplink data rate of wireless channel of user \(u_{i}\) to server \(s_{j}\) can be estimated by

$$\begin{aligned} R_{i,j}(A)=B\cdot log_{2}(1+\frac{p_{i}g_{i,j}}{\sum _{k\in UL_{j}}p_{k}g_{k,j}+\sigma }) \end{aligned}$$
(1)

where B is the channel bandwidth, \(p_{i}\) the transmit power of user \(u_{i}\), \(g_{i,j}\) the channel gain from user \(u_{i}\) to server \(s_{j}\), and \(\sigma \) the backgroud noise power. It can thus be seen from this equation, if too many users choose to offload its tasks to the same server, the uplink data rate decreases and further causes low offloading efficiency.

Assume user \(u_{i}\) chooses to offload its task \(t_{j}\) to server \(s_{k}\), according to the Eq. 1, the transmission time for offloading the input data of size \(C_{i,j,k}\) can be estimated by

$$\begin{aligned} TT_{i,j,k}(A)=\frac{C_{i,j,k}}{R_{i,j}}=\frac{C_{i,j,k}}{Blog_{2}(1+\frac{p_{i}g_{i,j}}{\sum _{k\in UL_{j}}p_{k}g_{k,j}+\sigma })} \end{aligned}$$
(2)

We assume that all wireless channels obey the quasi-static block fading rule [32]. This rule means that the state of the channel remains unchanged during transmission. Thus, the probability distribution of the completion time of the task is

$$\begin{aligned} T_{i,j,k} = TT_{i,j,k}(A)+TE_{i,j,k} \end{aligned}$$
(3)
$$\begin{aligned} PMF_{i,j,k}^{TE}(t)=Prob(TE_{i,j,k}) \end{aligned}$$
(4)
$$\begin{aligned} Prob(T_{i,j,k})=PMF_{i,j,k}^{T}(t)=PMF_{i,j,k}^{TE}(t- TT_{i,j,k}) \end{aligned}$$
(5)

where TE is historical execution time, PMF(t) indicates the probability mass function of the historical execution time.

3.2 Promblem Formulation

Based on the above system model, we are interested in knowing the highest probability of meeting the completion-time constraints. The resulting scheduling problem can be described as follows:

$$\begin{aligned} \max f=Prob_{avg}=\frac{1}{N}\sum _{i=1}^N Pr_{}(CT_{i}<=C_{i}^g) \end{aligned}$$
(6)

subject to,

$$\begin{aligned} i\in [1,N],CT_{i}\ge 0,C_{i}^g\ge 0 \end{aligned}$$
(7)

where \(C_i^g\) is a completion-time threshold for user \(u_{i}\) and \(CT_{i}\) the actual completion time of a user’s workflow.

4 Our Approach

4.1 Decomposition of the Global Constraint

For the evaluation of effectiveness of the actions by agents during the training process, we first have to decompose the global constraint to local ones. Given a workflow with n tasks, denoted by \(W=\{t_{1},t_{2},...,t_{n}\}\) and \(C^g\) as the global completion-time constraint, the local constraint of subtask can be represented by \(C^l=\{C_{1}^l,C_{2}^l,...,C_{n}^l\}\). We consider dividing the global constraint in proportion to the expected completion time of each part, where specific steps are as follows:

  1. 1.

    Obtain the server list whose coverage reach user k, denoted by \(S_{k}=\{s_{1},s_{2},...,s_{n}\}\).

  2. 2.

    For task \(t_{i}\), its completion time on server \(s_{j}\) is represented by a PMF that we mentioned above. The expected completion time \(e_{i,j}\) can be estimated by \(\{e_{i,j}|\int _0^{e_{i,j}}PMF(X)=0.5\}\).

  3. 3.

    Task \(t_{i}\) has multiple candidate servers \(S_k\) to be scheduled into, the expected completion time of task \(t_{i}\) is \(E_i^t=avg(e_i)\), where \(e_i=\{e_{i,1},e_{i,2},...,e_{i,n}\}\).

  4. 4.

    For any part \(p_{g}\), it consists of tasks \(T_g=\{t_1,t_2,...,t_n\}\). The expected completion time of this part is thus \(E_g^p=\max (E^t_{t_{1}},E^t_{t_{2}},...,E^t_{t_{n}})\)

  5. 5.

    Eventually, we can divide the global constraint into smaller ones as follows:

    $$\begin{aligned} C_{i}^l=C^g\cdot \frac{E_{i}^p}{\sum _{j=1}^{n}E_{j}^p} \end{aligned}$$
    (8)

4.2 Deep-Q-Network-based Solution to the Workflow Scheduling Problem

As mentioned earlier, we employ DQN for solving the optimization formulations given above. According to DQN, the value function updated by time difference can be expressed as:

$$\begin{aligned} Q(s,a)=(1-\alpha )Q(s,a)+\alpha [R(a)+\gamma \max _{a'\in A}Q(s',a')] \end{aligned}$$
(9)

where Q(sa) is the state-action value function at current state, \(Q(s',a')\) is the state-action value function at the next state, \(\alpha \) is the update step size, R(a) is the reward derived based on the PMF of the workflow completion time according to (12) and \(\gamma \) is the reward decay factor. The loss function of deep q network can be computed by

$$\begin{aligned} L(\theta )=E_{s,a,r,s'}[(Q^{*}(s,a|\theta )-y)^2] \end{aligned}$$
(10)
$$\begin{aligned} y=R(a)+ \delta \max _{a'\in A}Q^*(s',a') \end{aligned}$$
(11)

where y presents the target Q network whose parameters are periodically replaced by evaluate Q network \(Q^*\). The DQN procedure is shown in Algorithm 1.

figure a

The DQN environment includes components of environment observation, action space, policy setting, and reward design [33]. Note that the former 3 components can be implemented by using the standard DQN setting, while the reward design one should be developed based on the optimization formulation and the constraint decomposition configuration given in the previous sections. The reward function is designed as:

$$\begin{aligned} R_{i}(a)=Pr(X\le C_i^l)^3 \end{aligned}$$
(12)

where \(C_i^l\) is based on the decomposition of the global constraint given in (8).

Fig. 2.
figure 2

An example of edge servers with their coverages and edge users in Melbourne BCD

5 Case Study

In this section, we conduct simulative case studies to prove the effectiveness, in terms of workflow completion time, network loss value, and convergence speed of the algorithm of our method. The types of server, workflow and task are randomly generated. We assume as well that edge servers and users are located according to the position dataset of [34] as illustrated in Fig. 2. Edge servers have 3 different types, i.e., type1, type2, and type3, in terms of their resource configuration and performance. User applications are expressed in the form of multiple workflows as given in Fig. 3, where each workflow task is responsible for executing a GaussCLegendre calculation with 8, 16, or 32 million decimal digits. The historical execution time for GaussCLegendre calculations over different types of edge servers are based on data from [35] shown in Fig. 4. For the comparison purpose, we compare our proposed method with other existing methods, i.e., NSPSO [36] and NSGA [37] as well.

Fig. 3.
figure 3

Five typical workflow templates

Fig. 4.
figure 4

The historical task execution time for the GaussCLegendre calculation based on different edge of servers

5.1 Experiment Configuration

We test our methods and its peers by using a workstation with the Intel Core i7 CPU @ 2.80 GHz, NVIDIA GeForce GTX 1050 Ti, and 8 GB RAM configuration. Table 1 shows basic parameters used in the experiments.

Table 1. The parameters used in the experiment

5.2 Performance Evaluation

Based on the above configurations and datasets, we repeated invoking our proposed method to schedule workflows based on performance data of edge servers measured at 3 different time periods given in Fig. 4. It can be seen from Figs. 5 and 6 that the network loss decreases rapidly with time and the probability of satisfying global constraint increases with iterations.

Fig. 5.
figure 5

The loss of evaluation network

Fig. 6.
figure 6

The probability of satisfying global constraint

As can be seen from Fig. 7, our method clearly outperforms baseline algorithms at all 3 time periods in terms of workflow completion time.

Fig. 7.
figure 7

Workflow completion time at 3 different time periods

6 Conclusion

In this work, a novel probability-mass function and DQN-based approach to scheduling multi-workflows upon a distributed edge-computing environment is proposed. The proposed method is capable of handling time-varying performance of edge services through probability-mass functions of historical performance data and leveraging a Deep-Q-network framework for yielding high-quality workflow scheduling plans. To validate our proposed approach, we conduct a simulative case study based on a well-known edge-service-position dataset and demonstrate that our proposed method beats its peers in terms of the scheduling performance.