Monetary Cost Optimizations For Hosting Workflow-As-A-Service in Iaas Clouds
Monetary Cost Optimizations For Hosting Workflow-As-A-Service in Iaas Clouds
Monetary Cost Optimizations For Hosting Workflow-As-A-Service in Iaas Clouds
1, JANUARY-MARCH 2016
AbstractRecently, we have witnessed workflows from science and other data-intensive applications emerging on Infrastructure-
as-a-Service (IaaS) clouds, and many workflow service providers offering workflow-as-a-service (WaaS). The major concern of
WaaS providers is to minimize the monetary cost of executing workflows in the IaaS clouds. The selection of virtual machines
(instances) types significantly affects the monetary cost and performance of running a workflow. Moreover, IaaS cloud
environment is dynamic, with high performance dynamics caused by the interference from concurrent executions and price
dynamics like spot prices offered by Amazon EC2. Therefore, we argue that WaaS providers should have the notion of offering
probabilistic performance guarantees for individual workflows to explicitly expose the performance and cost dynamics of IaaS
clouds to users. We develop a scheduling system called Dyna to minimize the expected monetary cost given the user-specified
$
probabilistic deadline guarantees. Dyna includes an A -based instance configuration method for performance dynamics, and a
hybrid instance configuration refinement for using spot instances. Experimental results with three scientific workflow applications
on Amazon EC2 and a cloud simulator demonstrate (1) the ability of Dyna on satisfying the probabilistic deadline guarantees
required by the users; (2) the effectiveness on reducing monetary cost in comparison with the existing approaches.
Index TermsCloud computing, cloud dynamics, spot prices, monetary cost optimizations, scientific workflows
1 INTRODUCTION
be: PseqBand;type seqBand x, PrndBand;type rndBand x, 3.3 Hybrid Instance Configuration Refinement
PinBand;type inBand x and PoutBand;type outBand x as We consider the adoption of spot instances as a refinement
the probabilistic distributions for the sequential I/O, ran- to the configuration plan obtained from the previous step
dom I/O, downloading and uploading network perfor- $
(the A -based instance configuration algorithm) to further
mance from/to the persistent storage, respectively. In our reduce monetary cost. The major problem of adopting spot
calibrations on Amazon EC2, PrndBand;type rndBand x instances is that, running a task on spot instances may suffer
conforms to normal distributions and the other three from the out-of-bid events and fail to meet the deadline
conform to Gamma distributions (Section 4). Given the requirements. We propose a simple yet effective hybrid
I/O and network performance distributions and the cor- instance configuration to tackle this issue. The basic idea is,
responding I/O and networking data size, we manage to if the deadline allows, we can try to run a task on a spot
model the execution time of a task on different instance instance first. If the task can finish on the spot instance, the
types with probabilistic distribution functions (PDFs). For monetary cost tends to be lower than the monetary cost of
example, if the size of the input data on the disk is sin , running the task on an on-demand instance. It is possible
the probability of the time on reading the input data that we can try more than one spot instances, if the previous
equalling to sxin is PseqBand;type seqBand x, by assuming spot instance fails (as long as it can reduce the monetary
reading the input data is sequential accesses. cost and satisfy the probabilistic performance guarantee). If
Having modeled the execution time of tasks as probabi- all spot instances in the hybrid instance configuration fail,
listic distributions, we first introduce the implementation of the task is executed on an on-demand instance to ensure the
function estimate cost. The monetary cost of a state s is esti- deadline. When a task finishes the execution on a spot
mated to be the sum of the expected monetary cost of each instance, it is checkpointed, and the checkpoint is stored on
task running on the type of instance specified in s. Consider the persistent storage of the cloud (such as Amazon S3).
a task with on-demand instance type type and on-demand This is to avoid trigger the re-execution of its precedent
price p. We estimate the expected monetary cost of the task tasks. Dyna performs checkpointing only when the task
to be p multiplied by the expected execution time of the task ends, which is simple and has much less overhead than the
on the type-type on-demand instance. Here, we have general checkpointing algorithms [15].
ignored the rounding monetary cost in the estimation. This A hybrid instance configuration of a task is represented
is because in the WaaS environment, this rounding mone- as a vector of both spot and on-demand instance types, as
tary cost is usually amortized among many tasks. Enforcing described in Section 2.2. The last dimension in the vector is
$
the instance hour billing model could severely limit the the on-demand instance type obtained from the A -based
optimization space, leading to a suboptimal solution (a con- instance configuration step. The initial hybrid configuration
figuration plan with suboptimal monetary cost). contains only the on-demand instance type. Starting from
Another core evaluation function is estimate the initial configuration, we repeatedly add spot instances
performance. Given a state s and the execution time at the beginning of the hybrid instance configuration to find
distribution of each task under the evaluated state s, we first better configurations. Ideally, we can add n spot instances
calculate the execution time distribution of the entire work- (n is a predefined parameter). A larger n gives higher proba-
flow. Since the execution time of a task is now a distribution, bility of benefiting from the spot instances while a smaller n
rather than a static value, the execution time on the critical gives higher probability of meeting deadline requirement
path is also dynamic. To have a complete evaluation, we and reduces the optimization overhead. In our experiments,
apply a divide-and-conquer approach to get the execution we find that n 2 is sufficient for obtaining good optimiza-
time distribution of the entire workflow. Particularly, we tion results. A larger n greatly increases the optimization
decompose the workflow structure into the three kinds of overhead with only very small improvement on the optimi-
basic structures, as shown in Fig. 5. Each basic structure has zation results.
n tasks (n 2). The decomposition is straightforward by It is a challenging task to develop an efficient and effec-
identifying the basic structures in a recursive manner from tive approach for hybrid instance configuration refinement.
the starting task(s) of the workflow. First, coupled with the performance dynamics, it is a non-
The execution time distribution of each basic structure is trivial task to compare whether one hybrid instance config-
calculated with the execution time distributions of individ- uration is better than the other in terms of cost and perfor-
ual tasks. For example, the execution time distribution mance. Second, since the cloud provider usually offers
of the structure in Fig. 5b is calculated as MAXPDF0 ; multiple instance types and a wide range of spot prices, we
40 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. 1, JANUARY-MARCH 2016
are facing a large space for finding the suitable spot instance Otherwise, the overall execution time equals to the time
type and spot price. that task T has run on the spot instance before it fails, tf ,
To address those two challenging issues, we develop effi- plus the execution time of task T on the on-demand instance
cient and effective heuristics to solve the problem. We to , with the following probability
describe the details in the remainder of this section. Refining
a hybrid instance configuration Corig of a task to a hybrid PT;Crefined time tf to PT;type1 time tf
instance configuration Crefined , we need to determine
whether Crefined is better than Corig in terms of monetary PT;type2 time to (2)
cost and execution time distributions. Particularly, we have 1 psuc :
the following two considerations. We accept the refined
Now we discuss how to calculate psuc . Since a spot instance
configuration Crefined if both of the two considerations are
may fail at any time, we define a probabilistic function
satisfied.
ffpt; p to calculate the probability of a spot instance fails at
1) Probabilistic deadline guarantee consideration. Crefined time t for the first time when the bidding price is set to p.
should not violate the probabilistic deadline guaran- Existing studies have demonstrated that the spot prices can
tee of the entire workflow; be predicted using statistics models [17] or reverse engi-
2) Monetary cost reduction. The estimated monetary cost neering [18]. We use the recent spot price history as a pre-
of Crefined should be less than that of Corig . diction of the real spot price for ffpt; p to calculate the
Probabilistic deadline guarantee consideration. A naive way failing probability. We obtain that function with a Monte-
is to first calculate the probabilistic distribution of the entire Carlo based approach. Starting from a random point in the
workflows execution time under the refined configuration price history, if the price history becomes larger than p at
Crefined and then to decide whether the probabilistic dead- time t for the first time, we add one to the counter count. We
line requirement is met. However, this calculation introdu- repeat this process for NUM times (NUM is sufficiently
ces large overhead. We implement this process in the Oracle count
large) and return NUM as the failing probability. Using the
algorithm presented in Section 4. In Dyna, we propose a ffp function, we can define psuc as follows
light-weight localized heuristic to reduce the overhead. As Z ts
the on-demand configurations (i.e., the initial hybrid
$ psuc 1 ffpx; Pb dx: (3)
instance configuration) of each task found in the A -based 0
instance configuration step have already ensured the proba-
bilistic deadline requirement, we only need to make sure After obtaining the execution time distribution of a task
that the refined hybrid instance configuration Crefined of under the refined hybrid instance configuration Crefined , we
each task satisfies Crefined Corig , where is defined in Def- compare it with the configuration Corig according to Defini-
inition 1. Fig. 6 illustrates this definition. The integrals are tion 1. If Crefined Corig is satisfied, the probabilistic dead-
represented as cumulative distribution functions (CDFs). line guarantee consideration is satisfied.
With this heuristic, when evaluating the probabilistic dead- Monetary cost reduction. We estimate the monetary cost of
line guarantee consideration for a refined configuration, we a hybrid instance configuration of a task as the sum of the
only need to calculate the probabilistic distribution of the cost spent on the spot instance and the cost on the on-
execution time of a task rather than the entire workflow and demand instance. Using Equation (1)-(3), we calculate the
thus greatly reduce the optimization overhead. expected monetary cost of configuration Crefined in Equa-
tion (4). Note that, we use the bidding price Pb to app-
Definition 1. Given two hybrid instance configurations C1 roximate the spot price in calculating the cost on spot
and C2 of task T , we have C2 C1 if for 8t, we have instances. This calculation gives an upper bound of the
Rt Rt
0 PT;C2 time x dx 0 PT;C1 time x dx, where PT;C1 actual expected monetary cost of the refined configuration
and PT;C2 are the PDFs of task T under configuration C1 and and thus assures the correctness when considering the mon-
C2 , respectively. etary cost reduction. If the estimated monetary cost of the
refined configuration is lower than the monetary cost of the
In order to compare two hybrid instance configurations original configuration, the monetary cost reduction consid-
according to Definition 1, we first discuss how to estimate eration is satisfied
ZHOU ET AL.: MONETARY COST OPTIMIZATIONS FOR HOSTING WORKFLOW-AS-A-SERVICE IN IAAS CLOUDS 41
Iperf [38]. We find that the network bandwidth between previous study [40], MOHEFT is able to search the
instances of different types is generally lower than that instance configuration space and obtain a set of non-
between instances of the same type and S3. dominated solutions on the monetary cost and exe-
Workflows. There have been some studies on characteriz- cution time.
ing the performance behaviours of scientific workflows [19]. We conduct our experiments on both real clouds and
In this paper, we consider three common workflow struc- simulator. These two approaches are complementary,
tures, namely Ligo, Montage and Epigenomics. The three because some scientific workflows (such as Ligo and Epige-
workflows have different structures and parallelism. nomics) are not publicly available. Specifically, when the
We create instances of Montage workflows using Mon- workflows (including the input data and executables, etc.)
tage source code. The input data is the 2MASS J-band are publically available, we run them on public clouds. Oth-
images covering 8-degree by 8-degree areas retrieved from erwise, we simulate the execution with synthetic workflows
the Montage archive. The number of tasks in the workflow according to the workflow characteristics from existing
is 10,567. The input data size is 4 GB, where each of the studies [19].
2,102 tasks on the first level of the workflow structure reads On Amazon EC2, we adopt a popular workflow manage-
an input image of 2 MB. Initially, the input data is stored in ment system (Pegasus [41]) to manage the execution of
Amazon S3 storage. Since Ligo and Epigenomics are not workflows. We create an Amazon Machine Image (AMI)
open-sourced, we construct synthetic Ligo and Epigenomics installed with Pegasus and its prerequisites such as DAG-
workflows using the workflow generator provided by Pega- Man [42] and Condor [43]. We modify the Pegasus (release
sus [39]. We use the DAX files with 1,000 and 997 tasks 4.3.2) scheduler to enable scheduling the tasks onto instan-
(Inspiral_1000.xml and Epigenomics_997.xml [39]) for Ligo ces according to the hybrid instance configurations. A script
and Epigenomics, respectively. The input data size of Ligo written with Amazon EC2 API is developed for acquiring
is 9.3 GB, where each of the 229 tasks on the first level of the and releasing instances at runtime.
workflow structure reads 40.5 MB of input data on average. We develop a simulator based on CloudSim [44]. We
The input data size of Epigenomics is 1.7 GB, where each of mainly present our new extensions, and more details on
the seven tasks on the first level of the workflow structure cloud simulations can be found in the original paper [44].
reads 369 MB of DNA sequence data on average. The simulator includes three major components, namely
Implementations. In order to evaluate the effectiveness of Cloud, Instance and Workflow. The Cloud component
the proposed techniques in Dyna, we have implemented maintains a pool of resources which supports acquisition
the following algorithms. and release of Instance components. It also maintains the I/
O and network performance histograms measured from
Static. This approach is the same as the previous
Amazon EC2 to simulate cloud dynamics. A spot price trace
study in [1] which only adopts on-demand instances.
obtained from the Amazon EC2 history is also maintained
We adopt it as the state-of-the-art comparison. For a
to simulate the price dynamics. The Instance component
fair comparison, we set the workflow deadline
simulates the on-demand and spot instances, with cloud
according to the probabilistic QoS setting used in
dynamics from the calibration. We simulate the cloud
Dyna. For example, if the user requires 90 percent of
dynamics in the granularity of seconds, which means the
probabilistic deadline guarantee, the deterministic
average I/O and network performance per second conform
deadline used for Static is set to the 90th percentile
the distributions from calibration. The Workflow compo-
of the workflows execution time distribution.
nent manages the workflow structures and the scheduling
DynaNS. This approach is the same as Dyna except
of tasks onto the simulated instances.
that DynaNS does not use any spot instances. The
Experimental settings. We acquire the four measured types
comparison between Dyna and DynaNS is to assess
of instances from the US East region using the created AMI.
the impact of spot instances.
The hourly costs of the on-demand instance for the four
SpotOnly. This approach adopts only spot instances
$ instance types are shown in Table 1. Those four instances
during execution. It first utilizes the A -based
have also been used in the previous studies [15]. As for the
instance configuration approach to decide the
instance type for each task in the workflow. Then we instance acquisition time (lag), our experiments show that
set the bidding price of each task to be very high (in each on-demand instance acquisition costs 2 minutes and
our studies, we set it to be $1, 000) in order to guar- spot instance acquisition costs 7 minutes on average. This is
antee the probabilistic deadline requirement. consistent with the existing studies [45].
Oracle. We implement the Oracle method to assess The deadline of workflows is an important factor for the
the trade-off between the optimization overhead and candidate space of determining the instance configuration.
the effectiveness of the optimizations in Dyna. Ora- There are two deadline settings with particular interests:
cle is different from Dyna in that, Oracle does not Dmin and Dmax , the expected execution time of all the tasks
adopt the localized heuristic as Definition 1 (Section in the critical path of the workflow all on the m1.xlarge and
3.3) when evaluating the probabilistic deadline guar- m1.small instances, respectively. By default, we set the
antee consideration. This is an offline approach, deadline to be Dmin D
2
max
.
since the time overhead of getting the solution in We assume there are many workflows submitted by the
Oracle is prohibitively high. users to the WaaS provider. In each experiment, we submit
MOHEFT. We select a state-of-the-art multi-objec- 100 jobs of the same workflow structure to the cloud. We
tive approach [40] for comparison. According to the assume the job arrival conforms to a Poisson distribution.
ZHOU ET AL.: MONETARY COST OPTIMIZATIONS FOR HOSTING WORKFLOW-AS-A-SERVICE IN IAAS CLOUDS 43
TABLE 2
Parameters of I/O Performance Distributions
TABLE 3
Gamma Distribution Parameters on Bandwidth
between an Instance and S3
Fig. 11. Histogram of the spot price history in August 2013, US East
Region of Amazon EC2.
TABLE 4
Optimization Overhead of the Compared Algorithms on
Montage, Ligo and Epigenomics Workflows (Seconds)
Fig. 13. The normalized average monetary cost and average execution
time results of sensitivity studies on deadline.
DynaNS, SpotOnly, Dyna and Oracle are able to guarantee Fig. 15. The normalized average monetary cost and average execution
the probabilistic deadline requirement. time results of sensitivity studies on the probabilistic deadline guarantees.
Finally, we analyze the optimization overhead of the
compared algorithms. The optimization overhead results
are shown in Table 4. Note that, for workflows with the
same structure and profile, our system only need to do the
optimization once. Although Oracle obtains smaller mone-
tary cost than Dyna, the optimization overhead of Oracle is
16-44 times as high as that of Dyna. This shows that Dyna is
able to find optimization results close to the optimal results
in much shorter time. Due to the long execution time of the
Oracle optimization, in the rest of the experiments, we do
not evaluate Oracle but only compare Dyna with Static,
DynaNS and SpotOnly.
TABLE 5
Statistics on Spot Prices ($/hour, December 2011, Asia Pacific
Region) and On-Demand Prices of Amazon EC2
under project 1002-IRIS-09. This work is partly supported [22] H. Kloh, B. Schulze, R. Pinto, and A. Mury, A bi-criteria schedul-
ing process with CoS support on grids and clouds, Concurrency
by a MoE AcRF Tier 1 grant (MOE 2014-T1-001-145) in Computat. Pract. Exp., vol. 24, pp. 14431460, 2012.
Singapore. Amelie Chi Zhou is also with Nanyang Environ- [23] I. M. Sardi~ na, C. Boeres, and L. M. De A. Drummond, An
ment and Water Research Institute (NEWRI). Amelie Chi efficient weighted bi-objective scheduling algorithm for hetero-
Zhou is the corresponding author. geneous systems, in Proc. Int. Conf. Parallel Process., 2009,
pp. 102111.
[24] C. Lin and S. Lu, Scheduling scientific workflows elastically for
REFERENCES cloud computing, in Proc. IEEE Int. Conf. Cloud Comput., 2011,
pp. 746747.
[1] M. Mao and M. Humphrey, Auto-scaling to minimize cost [25] S. Di, C.-L. Wang and F. Cappello, Adaptive algorithm for mini-
and meet application deadlines in cloud workflows, in Proc. mizing cloud task length with prediction errors, IEEE Trans.
Int. Conf. High Perform. Comput., Netw. Storage Anal., 2011, Cloud Comput., vol. 2, no. 2, pp. 194207, Apr.Jun. 2014.
pp. 112 [26] M. Rodriguez and R. Buyya, Deadline based resource provision-
[2] M. Malawski, G. Juve, E. Deelman, and J. Nabrzyski, Cost- and ing and scheduling algorithm for scientific workflows on clouds,
deadline-constrained provisioning for scientific workflow ensem- IEEE Trans. Cloud Comput., vol. 2, no. 2, pp. 222235, Apr.Jun.
bles in IaaS clouds, in Proc. Int. Conf. High Perform. Comput., 2014.
Netw., Storage Anal., 2012, pp. 111. [27] D. de Oliveira, V. Viana, E. Ogasawara, K. Ocana, and M. Mattoso,
[3] A. C. Zhou, B. He, and S. Ibrahim, A taxonomy and survey on Dimensioning the virtual cluster for parallel scientific workflows
escience as a service in the cloud, Arxiv Preprint Arxiv:1407.7360, in clouds, in Proc. 4th ACM Workshop Sci. Cloud Comput., 2013,
2014. pp. 512.
[4] J. Yu, R. Buyya, and C. K. Tham, Cost-based scheduling of scien- [28] D. Oliveira, K. A. Oca~ na, F. Bai~ao, and M. Mattoso, A prove-
tific workflow application on utility grids, in Proc. 1st Int. Conf. nance-based adaptive scheduling heuristic for parallel scien-
E-Science Grid Comput., 2005, pp. 8147. tific workflows in clouds, J. Grid Comput., vol. 10, pp. 521
[5] R. Sakellariou, H. Zhao, E. Tsiakkouri, and M. D. Dikaiakos, 552, 2012.
Scheduling workflows with budget constraints, in Proc. Core- [29] N. Roy, A. Dubey, and A. Gokhale, Efficient autoscaling in the
GRID, 2007, pp. 189202. cloud using predictive models for workload forecasting, in Proc.
[6] R. Duan, R. Prodan, and T. Fahringer, Performance and cost opti- IEEE Int. Conf. Cloud Comput., 2011, pp. 500507.
mization for multiple large-scale grid workflow applications, in [30] J. Yang, J. Qiu, and Y. Li, A profile-based approach to just-in-time
Proc. ACM/IEEE Conf. Supercomput., 2007. scalability for cloud applications, in Proc. IEEE Int. Conf. Cloud
[7] S. Abrishami, M. Naghibzadeh, and D. H. J. Epema, Deadline- Comput., 2009, pp. 916.
constrained workflow scheduling algorithms for IaaS clouds, [31] A. C. Zhou and B. He, Transformation-based monetary cost opti-
Future Generation Comput. Syst., vol. 29, pp. 15169, 2013. mizations for workflows in the cloud, IEEE Trans. Cloud Comput.,
[8] E.-K. Byun, Y.-S. Kee, J.-S. Kim, and S. Maeng, Cost optimized vol. 2, no. 1, pp. 8598, Jan.Mar. 2013.
provisioning of elastic resources for application workflows, [32] S. Ostermann and R. Prodan, Impact of variable priced cloud
Future Gen. Comput. Syst., vol. 27, pp. 10111026, 2011. resources on scientific workflow scheduling, in Proc. 18th Int.
[9] S. Maguluri, R. Srikant, and L. Ying, Stochastic models of load Conf. Parallel Process., 2012, pp. 350362.
balancing and scheduling in cloud computing clusters, in Proc. [33] B. Javadi, R. K. Thulasiram, and R. Buyya, Characterizing spot
IEEE INFOCOM, 2012, pp. 702710. price dynamics in public cloud environments, Future Gen. Com-
[10] F. Zhang, J. Cao, K. Hwang, and C. Wu, Ordinal optimized put. Syst., vol. 29, pp. 988999, 2013.
scheduling of scientific workflows in elastic compute clouds, [34] H.-Y. Chu and Y. Simmhan, Cost-efficient and resilient job life-
in Proc. IEEE 3rd Int. Conf. Cloud Comput. Technol. Sci., 2011, cycle management on hybrid clouds, in Proc. IEEE 28th Int. Paral-
pp. 917. lel Distrib. Process. Symp., 2014, pp. 327336.
[11] A. C. Zhou and B. He, Simplified resource provisioning for [35] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, Towards
workflows in IaaS clouds, in Proc. IEEE 6th Int. Conf. Cloud Com- predictable datacenter networks, in Proc. ACM SIGCOMM Conf.,
put. Technol. Sci., 2014, pp. 650655. 2011, pp. 242253.
[12] J. Schad, J. Dittrich, and J.-A. Quiane-Ruiz, Runtime measure- [36] M. Hovestadt, O. Kao, A. Kliem, and D. Warneke, Evaluating
ments in the cloud: observing, analyzing, and reducing variance, adaptive compression to mitigate the effects of shared I/O in
Proc. VLDB Endowment, vol. 3, pp. 460471, 2010. clouds, in Proc. IEEE Int. Symp. Parallel Distrib. Process. Workshops
[13] A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and Phd Forum, 2011, pp. 10421051.
D. Epema, Performance analysis of cloud computing services for [37] S. Ibrahim, H. Jin, L. Lu, B. He, and S. Wu, Adaptive disk i/o
many-tasks scientific computing, IEEE Trans. Parallel Distrib. scheduling for MapReduce in virtualized environment, in Proc.
Syst., vol. 22, no. 6, pp. 931945, Jun. 2011. Int. Conf. Parallel Process., 2011, pp. 335344.
[14] H. Wang, Q. Jing, R. Chen, B. He, Z. Qian, and L. Zhou, [38] Iperf [Online]. Available: http://iperf.sourceforge.net, Jul. 2014.
Distributed systems meet economics: pricing in the cloud, in [39] Workflow Generator. (2014, Jul.) [Online]. Available: https://
Proc. HotCloud, 2010, pp. 17. confluence.pegasus.isi.edu/display/pegasus/
[15] S. Yi, A. Andrzejak, and D. Kondo, Monetary cost-aware check- WorkflowGenerator
pointing and migration on amazon cloud spot instances, IEEE [40] J. J. Durillo, R. Prodan, and H. M. Fard, MOHEFT: A multi-objec-
Trans. Services Comput., vol. 5, no. 4, pp. 512524, 4th Quarter 2012. tive list-based method for workflow scheduling, in Proc. IEEE 4th
[16] M. Mazzucco and M. Dumas, Achieving performance and avail- Int. Conf. Cloud Comput. Technol. Sci., 2012, pp. 185192.
ability guarantees with spot instances, in Proc. IEEE 13th Int. [41] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G.
Conf. High Perform. Commun., 2011, pp. 296303. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and
[17] B. Javadi, R. Thulasiram, and R. Buyya, Statistical modeling of D. S. Katz, Pegasus: A framework for mapping complex scientific
spot instance prices in public cloud environments, in Proc. IEEE workflows onto distributed systems, Sci. Program., vol. 13,
4th Int. Utility Cloud Comput., 2011, pp. 219228. pp. 219237, 2005.
[18] O. Agmon Ben-Yehuda, M. Ben-Yehuda, A. Schuster, and D. [42] CondorTeam, DAGMan [Online]. Available: http://cs.wisc.edu/
Tsafrir, Deconstructing amazon EC2 spot instance pricing, condor/dagman, Jul. 2014.
in Proc. IEEE 3rd Int. Conf. Cloud Comput. Technol. Sci.,, 2011, [43] M. Litzkow, M. Livny, and M. Mutka, CondorA hunter of idle
pp. 304311. workstations, in Proc. 8th Int. Conf. Distrib. Comput. Syst., 1988,
[19] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, and K. pp. 104111.
Vahi, Characterizing and profiling scientific workflows, Future [44] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R.
Gen. Comput. Syst., vol. 29, pp. 682692, 2013. Buyya, Cloudsim: A toolkit for modeling and simulation of cloud
[20] L. Abeni and G. Buttazzo, QoS guarantee using probabilistic computing environments and evaluation of resource provisioning
deadlines, in Proc. Euromicro Conf. Real-Time Syst., 1999, pp. algorithms, Softw. Pract. Exper., vol. 41, pp. 2350, 2011.
242249. [45] M. Mao and M. Humphrey, A performance study on the VM
[21] R. N. Calheiros and R. Buyya, Meeting deadlines of scientific startup time in the cloud, in Proc. IEEE 5th Int. Conf. Cloud Com-
workflows in public clouds with tasks replication, IEEE Trans. put., 2012, pp. 423430.
Parallel Distrib. Syst., 2013, pp. 17861796.
48 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 4, NO. 1, JANUARY-MARCH 2016
Amelie Chi Zhou received the bachelors and Cheng Liu received the bachelors and masters
masters degrees from Beihang University. She is degrees from USTC. He is currently a research
currently working toward the PhD degree assistant in School of Computer Engineering of
at School of Computer Engineering of NTU, Sin- NTU, Singapore. His areas of expertise include
gapore. Her research interests include cloud structured peer-to-peer network and compiler.
computing and database systems.
Bingsheng He received the bachelors degree in " For more information on this or any other computing topic,
computer science from SJTU, and the PhD please visit our Digital Library at www.computer.org/publications/dlib.
degree in computer science from HKUST. He is
an assistant professor in School of Computer
Engineering of NTU, Singapore. His research
interests include high-performance computing,
cloud computing, and database systems.