Neural Comput & Applic
DOI 10.1007/s00521-016-2448-8
ORIGINAL ARTICLE
Fault tolerance aware scheduling technique for cloud computing
environment using dynamic clustering algorithm
Shafi’i Muhammad Abdulhamid1,3 • Muhammad Shafie Abd Latiff1
Syed Hamid Hussain Madni1 • Mohammed Abdullahi1,2
•
Received: 17 September 2015 / Accepted: 1 July 2016
The Natural Computing Applications Forum 2016
Abstract In cloud computing, resources are dynamically
provisioned and delivered to users in a transparent manner
automatically on-demand. Task execution failure is no
longer accidental but a common characteristic of cloud
computing environment. In recent times, a number of
intelligent scheduling techniques have been used to
address task scheduling issues in cloud without much
attention to fault tolerance. In this research article, we
proposed a dynamic clustering league championship
algorithm (DCLCA) scheduling technique for fault tolerance awareness to address cloud task execution which
would reflect on the current available resources and
reduce the untimely failure of autonomous tasks. Experimental results show that our proposed technique produces remarkable fault reduction in task failure as
measured in terms of failure rate. It also shows that the
DCLCA outperformed the MTCT, MAXMIN, ant colony
optimization and genetic algorithm-based NSGA-II by
producing lower makespan with improvement of 57.8,
& Shafi’i Muhammad Abdulhamid
shafii.abdulhamid@futminna.edu.ng
Muhammad Shafie Abd Latiff
shafie@utm.my
Syed Hamid Hussain Madni
madni4all@yahoo.com
Mohammed Abdullahi
abdullahilwafu@abu.edu.ng
1
Faculty of Computing, Universiti Teknologi Malaysia,
Johor Bahru, Malaysia
2
Department of Mathematics, Ahmadu Bello University,
Zaria, Kaduna State, Nigeria
3
Department of Cyber Security Science, Federal University of
Technology Minna, Niger State, Nigeria
53.6, 24.3 and 13.4 % in the first scenario and 60.0, 38.9,
31.5 and 31.2 % in the second scenario, respectively.
Considering the experimental results, DCLCA provides
better quality fault tolerance aware scheduling that will
help to improve the overall performance of the cloud
environment.
Keywords Dynamic clustering Cloud scheduling Fault
tolerance Task scheduling League championship
algorithm
1 Introduction
Failures are to be expected in cloud computing environment. Cloud resources are known to experience fluxes in
their performance delivery [1]. Thus, fault-tolerant
scheduling technique that takes care of performance
variations, resource fluctuations and failures in the environment is important [2]. As task applications increase to
utilize cloud resources for a long time, the task will
unavoidably come upon growing amount of component
failures [3]. Once task failures occur, it has an effect on
the execution of the tasks scheduled to the failed components. Hence, a fault-tolerant mechanism is essential in
clouds. Fault tolerance is the capability of the cloud
scheduler to safeguard and protect the delivery of projected tasks even with the occurrence of failures in the
clouds system [4]. Providing fault tolerance in a cloud
computing system, while optimizing resource scheduling
and task execution is a demanding task especially for
cloud developers. Cloud scheduling mechanisms are
expected to have fault-tolerant components that identify
failures and resolve them within the shortest possible
time. These supportive components allow tasks to be
123
Neural Comput & Applic
scheduled on the cloud resources even in case of component breakdown without stopping the applications. In
this NP-hard problem, exact solution techniques are at
least exponential and a cloud provider has to depend on
fairly estimated results obtained in a suitable period based
on intelligent algorithms [5].
Fault tolerance awareness has been identified as one
of the main issues to ensure reliability, robustness and
availability of important services as well as running of
applications in the cloud computing system. Task failure
may be as a result of many factors such as overloaded
RAM, bandwidth shortage or power failure to mention
but a few. Fault tolerance depends on either time or
hardware redundancy so as to mask task or component
failures. Time redundancy involves the re-execution of
failed task after the malfunction has been identified. It
can be optimized through the application of checkpoint
techniques but even with that it still enforces a considerable delay. In various task critical systems, hardware
redundancy has been frequently deployed in the outline
of task replication to provide fault tolerance, evading
delay and sustaining fixed targets. Both methods present
downside and negative aspects, re-execution needs extra
time and replication needs extra resources, particularly
energy. This compels a trade-off between time and
hardware redundancy, in IaaS cloud computing systems
replication is mostly preferred because response time is
normally very vital [6].
The league championship algorithm (LCA) is an
intelligent algorithm that is first proposed by Kashan [7].
To form a synthetic championship setting, some idealized
rules need to be followed and then introduce the
promising computational intelligence algorithm that is
modeled based on a number of fascinating results relative
to sports championship round robin timetable. To evaluate
the performance of the LCA, Kashan and Karimi [8] use
five different benchmark test functions that include: the
Sphere, Rastrigin, Rosenbrock, Ackley and Schwefel
functions. In comparison with other state-of-the-art intelligent algorithms like the particle swarm optimization,
simulation results show that the LCA is a reliable optimization algorithm that can converge speedily to the
global optimal. This feature and its reliability in avoiding
local trappings make it a prepared choice for fault tolerance scheduling in the cloud environment. Consequently,
Abdulhamid et al. [9] present LCA-based scheduling
technique for global optimization of tasks scheduling in
the cloud system. Also, Abdulhamid et al. [10] present a
survey of the LCA optimization method detailing the realworld areas that the algorithms have been effectively
applied and also explored the prospects and challenges of
the technique.
123
Algorithmic complexity is concerned about how fast a
certain algorithm performs taking into cognizance the
total operations, input parameters, the resources and time
[11, 12]. The LCA has relatively low complexity in terms
of the number of iterations and total operations to be
performed as compared to other state-of-the-art intelligence techniques. This makes it very easy to implement.
The introduction of the DCLCA technique became
important as an alternative to the existing approaches
because of its strong local search and faster convergence
rate. It is versatile in reducing trappings into local optima
when solving complex multimodal problems. Additionally, it has few input parameter settings which reduce the
uncertainty in choosing the correct values that will lead to
fast convergence and accurate results. Furthermore, many
of the existing intelligent approaches did not consider
fault tolerance strategies and parameters in designing their
task scheduling algorithms in the cloud computing
system.
The aim of this work is to designing a dynamic clustering league championship algorithm (DCLCA) for task
scheduling problem with fault tolerance awareness in cloud
computing services. The remaining parts of the article are
organized as follows. Section 2 presents the related works,
which is divided into dynamic intelligent algorithms used
in the cloud environment and fault tolerance aware intelligent algorithms in cloud. Section 3 describes the fitness
function of the DCLCA. Section 4 details the fault tolerance scheduling components, task migration and the performance metric. Section 5 shows the design of the
proposed DCLCA optimization technique. Section 6
chronicles the experimental setup, while Sect. 7 explains
the results and discussion from two different scenarios.
Then lastly, Sect. 8 presents the summary and conclusions
of our findings.
2 Related works
Fault tolerance awareness in cloud has to do with the
mechanisms required to allow a technique to endure
system of task execution faults lingering in the system.
One of the merits of developing fault tolerance techniques
in cloud computing is failure avoidance, healing, cost
efficiency and superior performance metrics. As soon as
various instances of cloud tasks begin to execute on
numerous VMs, and then, some of the servers fail that
means there is a fault and it is normally taken care of
using fault tolerance mechanism [13]. A number of factors may lead to the failure of a server instance and
consequently tasks failure. Many at times, one failure
events stimulate another. These factors may include:
Neural Comput & Applic
hardware malfunction, operating system crashes, network
partitions, power outage and unforeseen software performance can all lead to the failure of a server request. Some
fault tolerance mechanisms have been presently in harmony with cloud computing scheduling [14, 15]. These
include: retry, healing, resubmission, replication, rejuvenation of software, masking and migration to mention but
a few, however, most of them are prone to heavy overhead and sometime leads to local trappings. In this section, we review some related literatures that applied
intelligent optimization techniques for solving the
dynamic task scheduling problem in the cloud computing
environment. We also surveyed related works that try to
address the issues of fault tolerance using the intelligent
algorithms in this environment.
2.1 Dynamic intelligent algorithms in cloud
scheduling
Dynamic scheduling techniques are designed to produce
a group of independent tasks into a set of suitable computational size or task clusters/partitions using tasks
characteristics focusing on priority or a partition
[16–18]. It is used to organize the scheduling of clustered tasks over limited heterogeneous cloud resources.
A heuristic is applied to categorize the task clusters, with
knowledge of the current limited resource states in the
cloud environment [16, 19–22]. It also efficiently combines task clustering and mapping into a joint resource
allocation technique to improve computing accessibility
of the resources.
Gangeshwari et al. [23] introduce a dynamic hyper
cubic peer-to-peer cloud (HPCLOUD) which is a structured peer-to-peer framework implemented on a cloud
computing system on which MapReduce method is used
for tasks scheduling. In addition, fault tolerance can be
attained on the HPCLOUD architecture. Experimental
outcome shows that the proposed HPCLOUD method
has demonstrated superior feat in terms of response time
with respect to increase the number of files. However,
the dynamicity of HPCLOUD is only at initial level of
scheduling and not integrated to cover all levels. In
addition, the response time is considered in measuring
the fault tolerance parameter, while other important
parameters are not taken into account. An intelligent
technique based on GA is also presented to decipher the
global tasks scheduling problem [24]. The technique is
based on the Pareto dominance relationship called
NSGA-II, giving no distinct optimal result, but a set of
results that are not subjugated on one another. The
related outcomes show the efficiency of the presented
technique and GA for small- and medium-sized
scheduling problems. However, the experimental results
did not demonstrate the performance with large and
massive cloud-based scheduling problems, the heterogeneity, dynamicity and likely local trappings are not
considered.
Using active replication technique, a dynamic and
reliability-driven real-time fault-tolerant scheduling algorithm (DRFACS) and greedy algorithm in case of enormous resource failures is put forward by Ling et al. [25].
It aims at increasing reliability by dynamically assign
reliant, non-preemptive, non-periodic instantaneous tasks,
trying to advance the QoS throughout scheduling process.
The experimental result shows that when the computing
and communication rate (CCR) value is microscopic,
DRFACS scheduling capability is better than FTSA and
FTBAR. When value of CCR is increased, DRFACS
performance is steadily minimized and FTSA and FTBAR
schemes do better than DRFACS trend. This shows that
the CCR has a strong impact on DRFACS, FTSA and
FTBAR reliability. As the CCR amplifies, DRFACS also
steadily progresses the reliability, and it shows that
increase in CCR will also increase the reliability of the
system. However, the dynamicity of DRFACS technique
is prioritized to only critical task. Tawfeek et al. [26] put
forward a dynamic cloud job scheduling technique using
ant colony optimization (ACO) intelligent optimization
method. Random optimization search method is adopted
for this approach that is utilized for managing the
incoming jobs mapping to the resources. The experimental results indicate that the proposed ACO method
outperformed FCFS and RR algorithms. However, the
dynamic scheduling in this technique is not a priority in
achieving the goal of fault tolerance. Therefore, because
of the constant changes in the state of the heterogeneous
cloud resources, there is an urgent need for a dynamic
clustering scheduling technique to reflect these changes.
The dynamic clustering technique will also helps in
managing time redundancy, which may also lead to tasks
failure at runtime.
2.2 Some recent tasks migration strategies in cloud
A task replication technique is put forward called heterogeneous earliest finish time (HEFT) for task fault tolerance
execution in cloud. The proposed HEFT heuristic makes
thorough use of further resources for the duration of task
replication; it attains good fault tolerance compares to the
common replication. HEFT finishes with the small workflows within the short makespan [27]. However, the
dynamicity of available cloud resources is not considered
when making scheduling decisions.
123
Neural Comput & Applic
Bala and Chana [28] put forward hybrid heuristic
scheduling approach (HHSA) to schedule scientific
workflows tasks on the IaaS cloud. Then, a fault-tolerant
method is developed based on VM migration method
that transfers the VM routinely in case of job failure
occurrences as a result of the overutilization of resources. The simulation outcome indicates that the HHSA
performs better than Min–Min, Max–Min, MCT, PSO
and GA by tumbling the average makespan for massive
scientific workflows such as Cybershake and epigenome.
The simulation results demonstrate the efficiency of the
proposed method to advance the feat of scientific
workflows by significantly tumbling total mean execution time, standard deviation time and makespan. However, the simulation results did not use any fault
tolerance parameter for the analysis to arrive at such
conclusions. Existing rescheduling techniques for fault
tolerance in MapReduce failed to totally reflect on the
position of distributed data and the calculation and
storage overhead of rescheduling failure jobs [29].
Consequently, a single VM failure will amplify the
completion time considerably.
A performance, power and failure–aware relaxed time
task execution (PPF-RTTE) algorithm is presented as a
performance imposing system, made up of a slowdown
estimator and a scheduling technique [30]. The slowdown estimator finds out based on noisy slowdown data
models acquired from modern slowdown meters, if jobs
will execute within the time limits, invoking the
scheduling technique if desired. Experimental outcome
show that the proposed approach can be resourcefully
incorporated with modern slowdown meters to accomplish tight SLAs in real-world environments, while
tumbling set expenditure in just 21 %. However, the
PPF-RTTE algorithm is more of a trade-off between
fault tolerance and scheduling performance. Also, there
is no clear fault tolerance parameter considered in the
analysis.
2.3 Fault tolerance aware intelligent algorithms
in cloud
Fault tolerance intelligent algorithm-based task
scheduling techniques in cloud computing environment
are important in order to avoid tasks failure due to
heterogeneity and dynamicity of available cloud resources [31, 32]. Min–min based time and cost trade-off
(MTCT) is presented for multi-objective workflow
scheduling to aid fault recovery in cloud [33]. The
MTCT technique was evaluated using simulations with
four different real-world scientific workflows scenarios
123
to test the strength of the technique. The outcomes
indicate that fault recovery has considerable influence on
the two performance criteria, and the MTCT algorithm is
valuable for real-life workflow systems when both of the
two optimization objectives are taking into account.
However, being a multi-objective algorithm, the MTCT
is inherently likely to diversify attention into other
parameters. Kumar and Aramudhan [34] introduce a task
scheduling technique in cloud computing using a
hybridization of BA technique with gravitational
scheduling algorithm taking into account deadline controls and trust model. The tasks are mapped to resources
on the basis of trust level. The hybridized algorithm is
experimented and proficiently minimizes the makespan
and also the amount of failed tasks in comparison with
GVSA. However, the BA is also known for weak local
search when dealing with complex problems. An intelligent technique called NSGA-II is also presented to
decipher the fault tolerance problem [24]. The NSGA-II
technique is based on the Pareto dominance relationship,
giving no distinct optimal result, but a set of results that
are not subject on one another. The feat of the technique
integrated with GA is demonstrated by a number
experimental result. The average response time outcomes
are highly interrelated with the makespan outcomes; still
the general tendency is more complex to explain.
Overall, the best outcomes are given by strategies
favoring reliability. The related outcomes show the
efficiency of the presented technique and GA for smalland medium-sized scheduling problems. However, the
experimental results did not demonstrate the performance with large and massive cloud-based scheduling
problems.
A fault tolerance and QoS scheduling based on content
addressable network (CAN) in mobile social cloud computing (MSCC) is presented by [35]. CAN as the basic
MSCC to carefully control the positions of mobile devices. An experimental simulation of the scheduling of both
with and without CAN is presented. The simulation
results show that the CAN fault tolerance scheduling
algorithm enhances cloud service execution time, finish
time and reliability to minimizes the cloud service error
rate. However, the CAN technique is only tested in
mobile cloud computing service and also there are no
comparative results with any state-of-the-art intelligent
algorithms. Tawfeek et al. [26] presents a dynamic cloud
task scheduling policy based on ACO intelligent optimization technique. It is random optimization search
approach that will be used for allocating the incoming
tasks to the VMs. Simulation outcome shows that cloud
task scheduling based on ACO surpasses FCFS and RR
Neural Comput & Applic
algorithms. However, the dynamic scheduling in this
technique is not a priority in achieving the goal of fault
tolerance. Particle swarm optimization (PSO) schemes are
also nature-inspired population-based intelligence techniques. The algorithms imitate the social characteristics of
birds flocking and fishes schooling. By initiating a randomly dispersed set of particles which are called potential
solutions, the scheme tries to develop solutions according
to a quality measure which is called fitness function [36].
Yuan et al. [37] put forward a virtual machines scheduling scheme that takes into account the computing power
of processing rudiments and also considers the computational density of the system. An improved PSO to address
the VM scheduling problem in the cloud computing
environment is presented. Verma and Kaushal [38] also
present a bi-criteria priority-based particle swarm optimization (BPSO) to schedule workflow jobs in a given
cloud computing environment for resources that reduced
the execution cost and the execution time under a given
deadline and capital. Similarly, the PSO have been
adapted in grid and cloud scheduling to solve the problem
of load balancing [39], service selection in grid [40],
tunable workflow in cloud [41] and energy-aware tasks
scheduling [42]. An energy-aware fault-tolerant scheduling (EAFTS) is put forward for public, multiuser cloud
systems and investigates the three-way trade-off among
reliability, performance and energy [43]. The technique
consists of a static scheduling segment that runs on task
graph using workload inputs before implementation, and a
insubstantial dynamic scheduler that migrates processes
for the duration of the implementation in case of undue
re-executions. Experimental results show that compared to
current VM or task replication methods, the proposed
technique is capable of minimizing the overall application
failure rates by over 50 % with about 76 % total energy
overhead. However, the replication strategy used in this
technique may affect the over system performance as well
as the dynamic scheduling policy.
The current available resources in the cloud need to be
applied at every scheduling point to avoid tasks failure due
to VM failure or overloading. Current dynamic scheduling
techniques did not either take fault tolerance parameters
into account or are partially applied at different scheduling
levels.
makespan time. Therefore, the fitness value of the
DCLCA can be computed using the fitness function in
Eq. 1
(
)
m
[
Ci
ð1Þ
f ðCÞ ¼ max
i¼1
Ci0
is the completion of task i. The lesser the
where
makespan the better the efficiency of the algorithm,
meaning less time is taken to execute the algorithm.
The expected time of completion (ETC) is defined as
the execution time for each task to compute on a certain VM obtained using the ETC matrix as shown in
Eq. 2. Amount of tasks multiply by the amount of
resources gives the dimension of the matrix, and its
elements are represented as ETC(Ti, Vk). An ETC
matrix related to this problem with n tasks T = {T1,
T2,…,Tn} and m VMs represented as V = {V1,
V2,…,Vm} resources
3
2
T1 V1 T1 V2 T1 Vm
7
6 T 2 V1
7
6
7
6
7
6
ð2Þ
ETC ¼ T V ¼ 6
7
7
6
4
5
T n V1
T n Vm
Also, Pf(ji, rk) is defined as the failure probability of
running a task with security demand (SD) and a trust level
(TL). The SD stands for the security demand for the
application as at the time of submitting tasks. The superior
is the SD value, the advance the security constraint for the
application. The trust model appraises the VM site’s reliability, to be precise, the TL. A task failure model is
described as a function of the difference between the task’s
demand and machine’s security. Equation 3 states the
failure probability regarding a scheduling of a task T with a
specific SDj value, to the VMi with trust value TLi. TL
stand for the security guarantee for the resources VM, the
more is the TL value the more advanced the VM reliability
[24, 44]
0;
if SDi TLk
Pf ðTi ; Vk Þ ¼
ð3Þ
1 e aðSDi TLk Þ ; if SDi [ TLk
where a is the failure coefficient which is a fraction
number.
3 DCLCA fitness function
4 Fault tolerance scheduling components
To derive the fitness function, consider that in cloud
scheduling, the main goal of the providers is to reduce the
completion time, while the aim of the clients is to reduce
the price of accessing cloud resources by reducing the
Figure 1 shows the scheduling components for the fault
tolerance aware technique. We describe in detail functionalities and synchronous workings of each of the components in this section.
123
Neural Comput & Applic
Fig. 1 Fault tolerance
scheduling components
4.1 Fault detector
Fault detector is a necessity in designing fault tolerance
mechanism. Many detection algorithms have diverse area
of emphasis on specific parameters, for instance fault
coverage, complexity and performance, etc. Previous fault
detection techniques are taxonomized by system level
pecking order which is also used in this proposed fault
tolerance LCA scheduling technique for the failure discovery on the operating system level, VM level and also at
the application level. In addition, VM introduced new
facilities for fault detection [45].
The tasks execution running on a VM can be scrutinized from remote site and the tasks failure can be
detected by the abnormal internal implementation information, like the abnormal cycles of system execution calls
[46, 47]. VM detection can be achieved by executing a
detection component located in virtual machine managers
(VMMs) that intermittently judge the fault status of VM.
One of the functions of the fault detector we present here
traces the failed task or VM and then schedules healing
sub-module in succession with a healing or recovery LCA
scheduling algorithm. The healing module is to direct the
resultant healing sub-modules to recover the faults
according to fault intensity and category. The fault’s
123
healing is accomplished one after the other until the task
is fully recovered.
4.2 Task migration
Task migration process involves the reassigning of jobs
from the queues of faulty resources to the heads of queues
of idle resources when those are accessible. It also solves
the tasks fragmentation problem. Task migration algorithm
reschedules abortive or failed tasks (Tn) to another available or under-loaded virtual machine VMj known as its
backup site. In case, some jobs did not complete execution
on a particular VM due to some reasons (like job overloading or VMi failure), the aborted or suspended jobs (Tn)
can be instantly migrated to another VMj for execution. Job
migration increases resource utilization and also provides
alternative resources in case of VM failure or overloading
as shown in Fig. 2.
According to Rathore and Chana [48], task migration
algorithms can be very helpful in solving the following
issues during scheduling.
•
Task migration algorithms are helpful in providing fault
tolerance awareness when executing a long-running
Neural Comput & Applic
Fig. 2 Flowchart of task
migration
•
•
tasks, VM failure, VM overload or system
maintenance.
Task migration algorithms can be very useful in
handling the problem of load balancing in an overloaded system. If a VM in a pool suddenly becomes
overloaded, the whole tasks on that VM can be
migrated to an under-loaded VM.
Task migration can be motivated by resource request.
For instance, tasks may require the use of massive
databases, accessible on a devoted VM of the cloud.
5 Dynamic clustering league championship
algorithm
The proposed dynamic clustering league championship
algorithm (DCLCA) task scheduling technique is also
designed by improving the LCA intelligent algorithm
inspired by the analogy of sporting contests. The dynamic
clustering algorithm is utilized to update and reflect the
current status of the cloud VM resources as shown in
Fig. 3.
5.1 Task clustering
The main purpose of task clustering is to allocate
any cluster of task, to be executed on any accessible
VMs dynamically. Figure 3 shows the dynamic task
clustering steps designed to achieve DCLCA. Consider
a subset of tasks jn [ J, where Pn represents a partition Pn = {j1, j2, j3,…,jn} of J into n clusters.
Therefore,
1.
2.
3.
ji 6¼ ; i ¼ 1; 2; 3; . . .; n
ji \ jj ¼ ; i; j ¼ 1; 2; 3; . . .; n i 6¼ j
Tn
i¼1 ji ¼ J
123
Neural Comput & Applic
Fig. 3 Dynamic task clustering
This clustering step basically is meant to categorize the
finest task-cluster to VM mapping using the cloud information system (CIS) to dynamically obtain the number of
available virtual machines (nVM), anticipated execution
time of cluster and capacity of the selected VM. Conversely, as a result of the dynamic characteristic of both
tasks and resources, the volume of cluster is tentative for
more resourceful usage. At each point of the scheduling,
the current CIS information is used to determine the current
number of available VM in order to dynamically partition
the tasks in accordance with the current number of available resources. Algorithm 1 shows the dynamic clustering
pseudo-code.
Table 1 Parameters matching
LCA
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Initialization
Get n from the CIS
= { 1 , 2 , 3 , … . , };
= { } = 1, 2, 3, … , ;
=0
/(
)
Let =
Current step:
< 1) and ( ≠ ∅) Do
While (
select
)/(
=(
)
k=k+1
Get current n from CIS
EndWhile
EA
Remark
League L
Population
Generate from dataset
Week t
Iteration
Number of runs
Team i
ith member of the population
ith task to be executed
Formation Xti
Solution
Present best solution
Playing strength
f(Xti)
Number of seasons S
123
Algorithm 1. Dynamic Clustering
Fitness function
Based on an objective function
Maximum iterations
Maximum schedules
Neural Comput & Applic
5.2 Parameters matching
In order to achieve optimization with the proposed algorithm DCLCA in scheduling cloud tasks, we must first
have to match the corresponding variables or parameters
of the two systems. To achieve this, a simple comparison
is made with the variables of a known evolutionary
algorithm (EA) and the following matching is achieved
(Table 1).
5.3 Winner/loser determination
One of the most important features of the LCA is the
winner/loser determination technique [49]. In this research
work, we utilized this feature in determining which cluster
of tasks is scheduled on which VM in the IaaS cloud. In a
cloud computing system, tasks sent by users are contest to
get access to resources for effective scheduling and their
best fitness value is evaluated on the basis of win/loss/tie
for each of the tasks. For instance, in football league, each
club is to get three points for a victory, zero for defeat and
one for draw. By ignoring irregular abnormalities which
may ensure even outstanding clubs in a variety of unsuccessful outcomes, it is probable that a more dominant club
having a superior playing pattern defeats the lesser team. In
an ideal league situation that is free from uncertainty
effects, an assumption can be easily made for a linear
correlation between the playing pattern of a club and the
result of its matches. Utilizing the fitness function condition, the winner/loser decision in LCA is determined in a
stochastic approach using criteria that the probability of
winning or optimally scheduling tasks is relative to its
degree of fitness value. Given cloud tasks i and j which are
send to access cloud resources (VM) at a given time t, with
the formations xti and xtj and fitness functions (strength) f(xti)
and f(xtj), correspondingly. Let Pti represents the probability
of tasks i to access the VM resources instead of task j at
xtþ1
id
8
t
t
>
þ
y
b
>
id w1 r1id
> id
>
>
>
< btid þ ytid w2 r1id
¼
>
>
btid þ ytid w1 r1id
>
>
>
>
: bt þ yt w r
id
id
2 1id
xtid
xtkd
xtid
xtkd
xtkd þ w1 r2id xtid
xtid þ w1 r2id xtid
xtkd þ w2 r1id xtjd
xtid þ w2 r2id xtjd
xtjd
xtjd
xtid
xtid
time t
value.
f xti
f xtj
(Pti is defined, respectively). Given f be an ideal
f^
f^
¼
pti
ptj
ð4Þ
From the LCA idealized rule, we deduce that
pti þ ptj ¼ 1
ð5Þ
From Eqs. 4 and 5 above, we solve for Pti
f xtj
f^
pti ¼
f xtj þ f ðxti Þ 2f^
ð6Þ
In order to find the winner or loser, a random number in
between 0 to1 is generated; if the generated number is BPti,
it means task i won and task j lost, else j won and i lost.
This method of finding the winner or loser is in line with
the idealized rules. If by chance f(xti) approaches f(xtj), then
Pti can be arbitrarily approaching . But, if f(xtj) becomes
far [f(xti), also written as f(xtj) f(xti), then Pti tends to one.
Then, the value of f may be unobtainable in the feature, we
use from the best function value found so far (that is,
f^t ¼ mini¼1;...;L ff ðBti Þg.
Using the strengths and weaknesses of each cluster of
tasks, we created a good fitness value by taking different
constraint into consideration. Likewise, a process is also
carried out using artificial analysis method, which is
SWOT (i.e., strengths, weaknesses, opportunities and
threats) to generate an appropriate focus strategy. Considering that as a rule, cluster of tasks with their recent best
fitness value, while planning the necessary changes suggested from the artificial analysis; the fresh solution xtþ1
¼
i
tþ1
tþ1
;
.
.
.;
x
Þ
for
a
cluster
of
tasks
i
where
ranges
;
x
ðxtþ1
in
i2
i1
from i = 1,…, L at a time t ? 1 could be evaluated based
on [50] as presented in Eq. 7
if f xti [ f xtj \ f
if f xti [ f xtj \ f
if f xtj [ f xti \ f
if f xtj [ f xti \ f
xti [ f
xtk [ f
xtl [ f
xtk [ f
xtk
xtl
xtk
xtl
ð7Þ
123
Neural Comput & Applic
The pseudo-code above shows that d is the dimension
index. r1id and r2id are uniform random values between
zero and one. w1 and w2 are coefficients that are used to
measure the inputs of ‘‘retreat’’ or ‘‘approach’’ mechanisms, in that order. It is also important to note the distinct
sign in parenthesis outcomes increase in the direction of the
winner or retreat away from the loser. To generate a new
schedule, a random number of changes made in Bti can be
calculated using Eq. 8
3
2
ln 1
1 ð1 pc Þn q0 þ1 r
5 þ q0 1;
qti ¼ 4
lnð1 pc Þ
qti 2 fq0 ; q0 þ 1; . . .; ng
ð8Þ
where r represents the random number generated between
zero to one and pc \1; pc 6¼ 0 donating a controlling
variable.
6 Experimental setup
To evaluate the proposed DCLCA fault-tolerant aware task
scheduling technique, a cloud simulator has been used. The
implementation and evaluation is done using the CloudSim
3.0.3 toolkit [51] on the Eclipse IDE Luna release 4.4.0.
The simulation is done using two different scenarios. Task
traces in the first scenario are generated from the Parallel
Workload Archive [52] which contains 73,496 tasks. This
workload archive is made available by San Diego Supercomputer Center (SDSC) and is in the standard workload
format (SWF) recognized by the CloudSim. While the
tasks traces in the second scenario are generated from the
CloudSim’s Workload PlanetLab. The DCLCA parameters
are set at w1 = w2 = 0.5 and pc ¼ 0:01 which the selection
of these values are based on [7].
Algorithm 2. Dynamic Clustering League Championship Algorithm (DCLCA)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Start
Obtain information about the list of task to be scheduled
Initialize population size L, number of initial maximum iterations S and set t=1
Obtain the number of available VMs from the CIS
Generate a present best solution for L-1 with initial iteration through each task formation
Generate the tasks cluster dynamically using Call (Algorithm 1)
Set task cluster formations and establish the fitness values for each cluster of task
Let the initialization be the tasks’ present best solution
While
Using the task cluster schedule at t, find the winner/loser among
each pair of tasks by utilizing the probability function
defined in equation 3
t=t+1
For i = 1 to L
Formulate a new optimal solution
for the next task using SWOT analysis in
equation 7, while taking into consideration the task’s new optimal solution
Compute fitness values for the new system using equation 1
If
is fitter than Then replace as new optimal solution
End For
17.
If mod (t, L-1) = 0
18.
Check fault
19.
If fault = True Then Call (Task Migration)
//**Task Migration**//
20.
Output a task with best fitness value as final optimal solution
21.
Output the smallest makespan using equation 1
22.
Output task failure ratio using equation 4
23.
Output task failure slowdown using equation 5
24.
End if
25. EndWhile
26. End
123
Neural Comput & Applic
Five different scheduling algorithms are used to compare the performance and effectiveness of our new proposed fault-tolerant scheduling technique based on the
DCLCA. The techniques include: MTCT [33] and MAXMIN [53], ACO [26] and the NSGA-II [24]. The ACO
parametric values are set according to [26]; number of ants
in colony = 10, evaporation factor q = 0.4, pheromone
tracking weight a = 0.3, heuristic information weight
b = 1 and pheromone updating constant Q = 100. While
the NSGA-II parameters are set according to [24]; population size = 1000, maximal iteration = 1000, the crossover rate = 0.5 and mutation rate = 0.1. The experiments
are repeated ten times for each of the techniques and the
averages of makespan time, failure ratio, failure slowdown
and the performance improvement rate (in percentage) are
observed. The two scenarios of the experiments are repeated with the same parametric values for all the chosen
scheduling techniques.
The FD of the proposed DCLCA fault-tolerant task
scheduling technique should be smaller than that of other
scheduling techniques used for the comparison and is calculated using Eq. 13
FD ¼
Time delay by failure to failure free task execution time
:
Average total number of tasks
ð13Þ
6.1.4 Performance improvement rate
Performance improvement rate (PIR) is defined as the
percentage of performance improvement (or reduction in
makespan) for the proposed DCLCA technique with regard
to the other techniques and is calculated using Eq. 14
PIRð%Þ ¼ ðmakespanðother schemeÞ
100
:
makespanðDCLCAÞ
ð14Þ
6.1 Performance metric
To measure and compare the performance of the fault
tolerance task scheduling mechanism in IaaS cloud computing environment, a number of performance parameters
are considered. These includes the failure ratio, the failure
slowdown and the performance improvement rate [54].
makespanðDCLCAÞÞ
7 Results and discussion
This section presents and discusses the results of the
experiments in the two formulated scenarios, so as to
evaluate the efficiency of the proposed DCLCA technique.
6.1.1 Makespan time
7.1 First scenario
The makespan is the maximum completion time or the time
when IaaS cloud system complete the latest task. So, if Cij
define the time that resource Vi needs to complete task Tj.
Therefore, RCi is the total time that resource Vi completes
all the tasks submitted to it. Equation 1 defines the makespan in cloud environment mathematically.
6.1.2 Failure ratio
The failure ratio (FR) is the ratio of sum total of failed
tasks in the proposed technique to the sum total of failed
tasks in the other scheduling technique. The proposed LCA
technique will be improved if the value of FR becomes less
than one and is calculated using Eq. 12
Pn
no: of failuresðDCLCAÞ
FR ¼ Pn 0
:
ð12Þ
no:
of failuresðother schemeÞ
0
6.1.3 Failure slowdown
Failure slowdown (FD) is described as the ratio of time
delay or interruption as a result of failure-to-failure-free
task execution time, mean over the sum total other tasks.
In the first scenario, five cloud users are created with five
brokers and two data centers. The first data center contains three hosts, while the second data center contains
two hosts. Ten VMs are also created using the Time_Shared policy, each with 512 BM, image size of
10,000 BM, one CPU each, managed by Xen as the virtual machine manager (VMM) on Linux operating system.
The host memory is 2048 MB, with storage capacity of
1,000,000 and a bandwidth of 10,000. Also, the number
of tasks (cloudlets) submitted ranges between 10 and 100
each with a length of 800,000 and a file size of 600.
Figures 4, 5 and 6 and Table 2 present the results
obtained from this scenario.
From Fig. 4, it shows that makespan (completion time
of the last task to be executed) increases as we increase the
number of cloudlets in all the six techniques under consideration. When small number of tasks is sent for execution, all the algorithms return relatively similar makespan
time with the DCLCA showing only slight improvement.
As we continue to increase the number of tasks from 10 to
100, the makespan time of the algorithms keeps widening
with the DCLCA returning less time. This means that,
123
Neural Comput & Applic
DCLCA takes less time to execute the cloud tasks than the
remaining algorithms under consideration. Table 2 shows
that the DCLCA present a PIR % of 57.8, 53.6, 24.3 and
3500
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
Makespan
3000
2500
2000
1500
1000
500
0
10
20
30
40
50
60
70
80
90
100
Number of Tasks
Fig. 4 Makespan time of first scenario
1
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
Failure RaƟo
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10
20
30
40
50
60
70
80
90
100
13.4 % over the MTCT, MINMAX, ACO and NSGA-II,
respectively.
Figures 5 and 6 present the failure ratio and the failure
slowdown, while Table 2 presents the performance
improvement rate (in percentage), respectively. The failure
ratio as compared to the nearest intelligent algorithms
decreases as we increase the number of tasks for all the
techniques. The lesser is the failure ratio, the better success
of task execution rate. The DCLCA also shows improvement in the failure ratio as it returns lesser ration with
increase in tasks and compared to the other techniques. The
result obtained shows that failure slowdown of our proposed DCLCA technique is less than the other intelligent
algorithms. It also shows that the PIR % of our proposed
technique improved better and faster than the other algorithms as it relates to minimum makespan time and reliability. This means that the proposed DCLCA scheduling
technique is more fault-tolerant and reliable than the other
algorithms under consideration. The likely reason the
DCLCA outperforms existing approaches under consideration is due to its dynamic nature to immediately reflect the
current state of the resources in the heterogeneous environment. It can as well likely be as a result of its efficient
performance in both local and global search as compared to
other meta-heuristics.
7.2 Second scenario
Number of Tasks
Fig. 5 Failure ratio of first scenario
7
MTCT
MAXMIN
ACO
NSGA-II
Failure Slowdown
6
5
4
3
2
1
0
10
20
30
40
50
60
70
80
90
Number of Tasks
Fig. 6 Failure slowdown of first scenario
Table 2 DCLCA performance
improvement rate (%) on
makespan
Total makespan
PIR % over MTCT
PIR % over MAXMIN
PIR % over ACO
PIR % over NSGA-II
123
100
The second scenario, we set 10 cloud users with 10 brokers
and five data centers. Each of the data centers contains
three hosts, making a total of 15 hosts. A total of 25 VMs
are also created using the Space_Shared policy, each with
512 BM, image size of 20,000 BM, one CPU each, managed by Xen as the VMM on Linux operating system. The
host memory is 2048 MB, with storage capacity of
1,000,000 and a bandwidth of 10,000. Also, the number of
tasks (cloudlets) submitted ranges between 50 and 500
each with a length of 900,000 and a file size of 1000.
Figures 7, 8 and 9 and also Table 3 present the results
obtained from this scenario.
Similarly, in Fig. 7 it shows that when small number of
tasks is sent for execution, all the algorithms return relatively similarly makespan time with the GA and DCLCA
showing only slight improvement with lesser makespan
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
14,042.7
13,671.9
11,057.4
10,099.5
8898.8
2.7
27.0
39.0
57.8
23.6
35.4
53.6
9.5
24.3
13.5
Neural Comput & Applic
10000
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
9000
8000
Makespan
7000
6000
5000
4000
3000
2000
1000
0
50
100
150
200
250
300
350
400
450
500
Number of Tasks
Fig. 7 Makespan time of second scenario
1
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
0.9
0.8
Fault RaƟo
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
50
100
150
200
250
300
350
400
450
500
Number of Tasks
Fig. 8 Failure ratio of second scenario
12
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
Fault Slow Down
10
8
time. When the number of tasks increase from 50 to as
much as 500, the makespan time of the algorithms keeps
widening with the DCLCA returning lesser time. Table 3
shows that the DCLCA present a PIR % of 60, 38.9, 31.5
and 31.2 % over the MTCT, MINMAX, ACO and NSGAII, respectively. This shows that the DCLCA use lesser
time to implement the cloud tasks than the remaining
algorithms under consideration.
In Figs. 7, 8 and 9 and Table 3, the DCLCA shows
improvement in terms of failure ration as it returns lesser
ration with increase in tasks in comparison to the other
algorithms. The result obtained implies that failure slowdown of our proposed DCLCA technique is less than the
other intelligent algorithms. Simulation evaluation also
shows that the performance improvement rate of our proposed technique improved better and more speedily than
the other algorithms as relates to minimum makespan time
and reliability. This shows that the proposed fault-tolerant
aware DCLCA scheduling technique performance and
reliable than the other algorithms under consideration.
The proposed DCLCA approach is valuable for the
management dynamic and responsive faults by predicting
tasks failures along with scheduling. By implication, it
explores the prospects for failure projection and handling
in cloud applications so as to minimize the depletion of
resources by tasks that fail in the process of execution. This
study is also important because of the performance variations experienced among cloud resources and the imminent
occurrence of failure during application scheduling, environment and the network. The contributions of this study
help in making the execution of tasks faultless on cloud
computing systems.
8 Summary and conclusion
6
4
2
0
50
100
150
200
250
300
350
400
450
Number of Tasks
Fig. 9 Failure slowdown of second scenario
Table 3 DCLCA Performance
improvement rate (%) on
makespan
Total makespan
PIR % over MTCT
PIR % over MAXMIN
PIR % over ACO
PIR % over NSGA-II
500
We proposed a DCLCA technique for dynamic clustering
fault tolerance aware intelligent scheduling using the LCA
optimization algorithm. Task migration and fault detector
strategies are also implemented as additional fault reduction components with an efficient method of scheduling in
order to minimize makespan time. In order to evaluate the
proposed technique, the CloudSim simulation toolkit is
MTCT
MAXMIN
ACO
NSGA-II
DCLCA
59,411.3
51,571.4
48,821.0
48,714.5
37,131.4
15.2
21.7
22.0
60.0
5.6
5.9
38.9
0.29
31.5
31.2
123
Neural Comput & Applic
used to create two different scenarios in IaaS cloud environment using two different datasets.
The result of our experiment shows that the proposed
DCLCA intelligent technique returned a significant fault
reduction in task failure as measured in terms of failure
ratio, failure slowdown and PIR parameters. It also proves
that the proposed DCLCA technique performs better than
the MTCT, MAXMIN, ACO and NSGA-II by returning
reduced makespan time in addition to the above mentioned
fault tolerance parameters in the two different simulated
scenarios. In view of these experimental results, it shows
that our proposed DCLCA fault aware technique provides
better quality scheduling results than the other intelligent
techniques. This indicates that the technique is very
appropriate for task execution in the cloud computing
environment. Therefore, the authors would wish to recommend hybridization of the LCA with other effective
intelligent scheduling techniques in other to produce more
performance in terms of fault tolerance. We also wish to
recommend testing the technique in a real cloud
environment.
Acknowledgments The authors would like to acknowledge and
appreciate the support of Universiti Teknologi Malaysia (UTM),
Research University Grant Q. J130000.2528.05H87 and the Nigerian
Tertiary Education Trust Fund (TetFund) for their support.
References
1. Gital AY, Ismail AS, Chen M, Chiroma H (2014) A framework
for the design of cloud based collaborative virtual environment
architecture. In: Proceedings of the international multi conference
of engineers and computer scientists
2. Lu K, Yahyapour R, Wieder P, Yaqub E, Abdullah M, Schloer B,
Kotsokalis C (2016) Fault-tolerant service level agreement lifecycle management in clouds using actor system. Future Gener
Comput Syst 54:247–259
3. Moon Y-H, Youn C-H (2015) Multihybrid job scheduling for
fault-tolerant distributed computing in policy-constrained
resource networks. Comput Netw 82:81–95
4. He J, Dong M, Ota K, Fan M, Wang G (2014) NetSecCC: a
scalable and fault-tolerant architecture for cloud computing
security. Peer-to-Peer Netw Appl 9(1):67–81
5. Nawi NM, Khan A, Rehman MZ, Chiroma H, Herawan T (2015)
Weight optimization in recurrent neural networks with hybrid
metaheuristic Cuckoo search techniques for data classification.
Math Probl Eng. doi:10.1155/2015/868375
6. Mills B, Znati T, Melhem R (2014) Shadow computing: an
energy-aware fault tolerant computing model. In: 2014 International conference on computing, networking and communications
(ICNC), IEEE, pp 73–77
7. Kashan HA (2009) League championship algorithm: a new
algorithm for numerical function optimization. In: International
conference of soft computing and pattern recognition, 2009.
SOCPAR’09, IEEE, pp 43–48
8. Kashan HA, Karimi B (2012) A new algorithm for constrained
optimization inspired by the sport league championships. In:
2010 IEEE congress on evolutionary computation (CEC), IEEE,
pp 1–8
123
9. Abdulhamid SM, Latiff MSA, Ismaila I (2014) Tasks scheduling
technique using league championship algorithm for makespan
minimization in IAAS cloud. ARPN J Eng Appl Sci
9(12):2528–2533
10. Abdulhamid SM, Latiff MSA, Madni SHH, Oluwafemi O (2015)
A survey of league championship algorithm: prospects and
challenges. Indian Jo Sci Technol 8(S3):101–110
11. Yang Y-G, Tian J, Lei H, Zhou Y-H, Shi W-M (2016) Novel
quantum image encryption using one-dimensional quantum cellular automata. Inf Sci 345:257–270
12. Dondi R, El-Mabrouk N, Swenson KM (2014) Gene tree correction for reconciliation and species tree inference: complexity
and algorithms. J Discrete Algorithms 25:51–65
13. Abdulhamid SM, Latiff MSA, Bashir MB (2014) On-demand
grid provisioning using cloud infrastructures and related virtualization tools: a survey and taxonomy. Int J Adv Stud Comput Sci
Eng IJASCSE 3(1):49–59
14. Kushwah VS, Goyal SK, Narwariya P (2014) A survey on various
fault tolerant approaches for cloud environment during load
balancing. Int J Comput Netw Wirel Mobile Commun 4(6):25–34
15. Yang W, Zhang C, Shao Y, Shi Y, Li H, Khan M, Hussain F,
Khan I, Cui L-J, He H (2014) A hybrid particle swarm optimization algorithm for service selection problem in the cloud. Int
J Grid Distrib Comput 7(4):1–10
16. Hussin M, Lee YC, Zomaya AY (2010) Dynamic job-clustering
with different computing priorities for computational resource
allocation. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing, IEEE
Computer Society, pp 589–590
17. Vidhate D, Patil A, Guleria D (2010) Dynamic cluster resource
allocations for jobs with known memory demands. In: Proceedings of the international conference and workshop on emerging
trends in technology, ACM, pp 64–69
18. SiM Abdulhamid, Latiff SMA, Bashir MB (2014) Scheduling
techniques in on-demand grid as a service cloud: a review.
J Theor Appl Inf Technol 63(1):10–19
19. Abdullahi M, Ngadi MA (2016) Symbiotic organism search
optimization based task scheduling in cloud computing environment. Future Gener Comput Syst 56:640–650
20. Madni SHH, Latiff MSA, Coulibaly Y (2016) An appraisal of
meta-heuristic resource allocation techniques for IaaS cloud.
Indian J Sci Technol 9(4):1–14. doi:10.17485/ijst/2016/v9i4/
80561
21. Chiroma H, Shuib NLM, Muaz SA, Abubakar AI, Ila LB, Maitama JZ (2015) A review of the applications of bio-inspired
flower pollination algorithm. Procedia Comput Sci 62:435–441
22. Boru D, Kliazovich D, Granelli F, Bouvry P, Zomaya AY (2015)
Energy-efficient data replication in cloud computing datacenters.
Clust Comput 18(1):385–402. doi:10.1007/s10586-014-0404-x
23. Gangeshwari R, Subbiah J, Malathy K, Miriam D (2012)
HPCLOUD: a novel fault tolerant architectural model for
hierarchical MapReduce. In: 2012 international conference on
recent trends in information technology (ICRTIT), IEEE,
pp 179–184
24. G¸asior J, Seredyński F (2013) Multi-objective parallel machines
scheduling for fault-tolerant cloud systems. In: Joanna K, Di
Martino B, Talia D, Xiong K (eds) Algorithms and architectures
for parallel processing. Springer, Switzerland, pp 247–256.
doi:10.1007/978-3-319-03859-9_21
25. Ling Y, Ouyang Y, Luo Z (2012) A novel fault-tolerant
scheduling algorithm with high reliability in cloud computing
systems. J Converg Inf Technol 7(15):107–115. doi:10.4156/jcit.
vol7.issue15.13
26. Tawfeek M, El-Sisi A, Keshk A, Torkey F (2015) Cloud task
scheduling based on ant colony optimization. Int Arab J Inf
Technol (IAJIT) 12(2):129–137
Neural Comput & Applic
27. Ganga K, Karthik S (2013) A fault tolerent approach in scientific
workflow systems based on cloud computing. In: 2013 international conference on pattern recognition, informatics and mobile
engineering (PRIME), IEEE, pp 387–390
28. Bala A, Chana I (2015) Autonomic fault tolerant scheduling
approach for scientific workflows in Cloud computing. Concurr
Eng 23(1):27–39. doi:10.1177/1063293X14567783
29. Bonvin N, Papaioannou TG, Aberer K (2010) A self-organized,
fault-tolerant and scalable replication scheme for cloud storage.
In: Proceedings of the 1st ACM symposium on cloud computing,
ACM, pp 205–216
30. Sampaio AM, Barbosa JG (2015) A performance enforcing
mechanism for energy-and failure-aware cloud systems. In: 2014
international green computing conference, IGCC 2014. doi:10.
1109/IGCC.2014.7039151
31. Patra PK, Singh H, Singh G (2013) Fault tolerance techniques
and comparative implementation in cloud computing. Int J
Comput Appl 64(14):37–41
32. Nawi NM, Khan A, Rehman M, Chiroma H, Herawan T (2015)
Weight optimization in recurrent neural networks with hybrid
metaheuristic Cuckoo search techniques for data classification.
Math Probl Eng 501:868375
33. Xu H, Yang B, Qi W, Ahene E (2016) A multi-objective optimization approach to workflow scheduling in clouds considering
fault recovery. KSII Trans Internet Inf Syst 10(3):976–995.
doi:10.3837/tiis.2016.03.002
34. Kumar VS, Aramudhan M (2014) Hybrid optimized list
scheduling and trust based resource selection in cloud computing.
J Theor Appl Inf Technol 69(3):434–442
35. Choi S, Chung K, Yu H (2014) Fault tolerance and QoS
scheduling using CAN in mobile social cloud computing. Clust
Comput 17(3):911–926
36. Kaveh A (2014) Particle swarm optimization. In: Advances in
metaheuristic algorithms for optimal design of structures.
Springer, Switzerland, pp 9–40. doi:10.1007/978-3-319-05549-7
37. Yuan H, Li C, Du M (2014) Optimal virtual machine resources
scheduling based on improved particle swarm optimization in
cloud computing. J Softw 9(3):705–708
38. Verma A, Kaushal S (2014) Bi-criteria priority based particle
swarm optimization workflow scheduling algorithm for cloud. In:
2014 recent advances in engineering and computational sciences
(RAECS), IEEE, pp 1–6
39. Ramezani F, Lu J, Hussain FK (2014) Task-based system load
balancing in cloud computing using particle swarm optimization.
Int J Parallel Program 42(5):739–754
40. Yang W, Zhang C, Shao Y, Shi Y, Li H, Khan M, Hussain F,
Khan I, Cui L-J, He H (2014) A hybrid particle swarm optimization algorithm for service selection problem in the cloud. Int
J Grid Distrib Comput 7(4):1–10. doi:10.14257/ijgdc.2014.7.4.01
41. Wu K (2014) A tunable workflow scheduling algorithm based on
particle swarm optimization for cloud computing. Master’s Projects, Paper 358. San José State University, USA
42. Zhang W, Xie H, Cao B, Cheng AM (2014) Energy-aware realtime task scheduling for heterogeneous multiprocessors with
particle swarm optimization algorithm. Math Probl Eng 2014:
1–9. doi:10.1155/2014/287475
43. Gao Y, Gupta SK, Wang Y, Pedram M (2014) An energy-aware
fault tolerant scheduling framework for soft error resilient cloud
computing systems. In: Design, automation and test in Europe
conference and exhibition (DATE), 2014, IEEE, pp 1–6
44. Hu Y, Gong B, Wang F (2010) Cloud model-based securityaware and fault-tolerant job scheduling for computing grid. In:
ChinaGrid conference (ChinaGrid), 2010 fifth annual, IEEE,
pp 25–30
45. Qiang W, Jiang C, Ran L, Zou D, Jin H (2015) CDMCR: multilevel fault-tolerant system for distributed applications in cloud.
Secur Commun Netw 2015:SCN-SI-077. doi:10.1002/sec.1187
46. Urgaonkar R, Wang SQ, He T, Zafer M, Chan K, Leung KK
(2015) Dynamic service migration and workload scheduling in
edge-clouds. Perform Eval 91:205–228. doi:10.1016/j.peva.2015.
06.013
47. Vobugari S, Somayajulu D, Subaraya BM (2015) Dynamic
replication algorithm for data replication to improve system
availability: a performance engineering approach. IETE J Res
61(2):132–141. doi:10.1080/03772063.2014.988757
48. Rathore N, Chana I (2014) Load balancing and job migration
techniques in grid: a survey of recent trends. Wirel Pers Commun
79(3):2089–2125
49. Yadav S, Nanda SJ (2015) League championship algorithm for
clustering. In: 2015 IEEE power, communication and information
technology conference (PCITC), IEEE, pp 321–326
50. Xu W, Wang R, Yang J (2015) An improved league championship algorithm with free search and its application on production scheduling. J Intell Manuf 1–10. doi:10.1007/s10845015-1099-4
51. Buyya R, Ranjan R, Calheiros RN (2009) Modeling and simulation of scalable cloud computing environments and the
CloudSim toolkit: challenges and opportunities. In: International
conference on high performance computing & simulation, 2009.
HPCS’09, IEEE, pp 1–11
52. Parallel Workload Archive - SDSC-SP2-1998-4.swf (2015).
http://www.cs.huji.ac.il/labs/parallel/workload/l_sdsc_sp2/index.
html. Accessed 30 Jan 2015
53. Ramakrishnan L, Reed DA (2008) Performability modeling for
scheduling and fault tolerance strategies for scientific workflows.
In: Proceedings of the 17th international symposium on High
performance distributed computing, ACM, pp 23–34
54. Garg R, Singh AK (2014) Fault tolerant task scheduling on
computational grid using checkpointing under transient faults.
Arabian J Sci Eng 39(12):8775–8791
123