Co-locating the computation as close as possible to the data is an important consideration in the current data intensive systems. This is known as data locality problem. In this paper, we analyze the impact of data locality on YARN,... more
Co-locating the computation as close as possible
to the data is an important consideration in the current data
intensive systems. This is known as data locality problem. In
this paper, we analyze the impact of data locality on YARN,
which is the new version of Hadoop. We investigate YARN delay
scheduler behavior with respect to data locality for a variety of
workloads and configurations. We address in this paper three
problems related to data locality. First, we study the trade-off
between the data locality and the job completion time. Secondly,
we observe that there is an imbalance of resource allocation when
considering the data locality, which may under-utilize the cluster.
Thirdly, we address the redundant I/O operations when different
YARN containers request input data blocks on the same node.
Additionally, we propose YARN Locality Simulator (YLocSim),
a simulator tool that simulates the interactions between YARN
components in a real cluster and reports the data locality
percentages in real time. We validate YLocSim over a real cluster
setup and use it in our study
Task scheduling plays a key role in cloud computing systems. Scheduling of tasks cannot be done on the basis of single criteria but under a lot of rules and regulations that we can term as an agreement between users and providers of... more
Task scheduling plays a key role in cloud computing systems. Scheduling of tasks cannot be done on the basis of single criteria but under a lot of rules and regulations that we can term as an agreement between users and providers of cloud. This agreement is nothing but the quality of service that the user wants from the providers. Providing good quality of services to the users according to the agreement is a decisive task for the providers as at the same time there are a large number of tasks running at the provider’s side. The task scheduling problem can be viewed as the finding or searching an optimal mapping/assignment of set of subtasks of different tasks over the available set of resources (processors/computer machines) so that we can achieve the desired goals for tasks. In this paper we are performing comparative study of the different algorithms for their suitability, feasibility, adaptability in the context of cloud scenario, after that we try to propose the hybrid approach that can be adopted to enhance the existing platform further. So that it can facilitate cloud-providers to provide better quality of services.
Cloud computing is a computing paradigm where platform, scalable resources, data storage and IT services are provided over the internet. Cloud Computing environment consists of large customers requesting for cloud resources. Nowadays,... more
Cloud computing is a computing paradigm where platform, scalable resources, data storage and IT services are provided over the internet. Cloud Computing environment consists of large customers requesting for cloud resources. Nowadays, task scheduling problem is the current research topic in cloud computing. Due to vast availability of resources and numerous tasks being submitted to the task management becomes important for optimal scheduling which affects the efficiency of the whole cloud computing environment. Achieving deadline and reducing cost is the main focus when we schedule the tasks by using available resources. This paper presents different proposed scheduling algorithms and strategies for independent task and workflow application in cloud computing.
Cloud computing is a pool of large, inter-related and virtualized resources that can be accessed over internet on demand by the user. Users get charged according to their use of cloud services. When we perform scheduling of tasks we must... more
Cloud computing is a pool of large, inter-related and virtualized resources that can be accessed over internet on demand by the user. Users get charged according to their use of cloud services. When we perform scheduling of tasks we must have to know about how much time a particular task spend for getting its desired resource from the pool of resources from the point of their submission for execution. In this paper we proposed an algorithm that not only calculates the total execution cost, total execution time as well as the time for respective tasks for finding their cost efficient resources.
League Championship Algorithm (LCA) is a sports-inspired population based algorithmic framework for global optimization over a continuous search space first proposed by Ali Husseinzadeh Kashan in the year 2009. A general characteristic... more
League Championship Algorithm (LCA) is a sports-inspired population based algorithmic framework for global optimization over a continuous search space first proposed by Ali Husseinzadeh Kashan in the year 2009. A general characteristic between all population based optimization algorithms similar to the LCA is that, it tries to progress a population of achievable solutions to potential areas of the search space when seeking the optimization. In this paper, we proposed a job scheduling algorithm based on an enhanced LCA optimization technique for the infrastructure as a service (IaaS) cloud. Three other established algorithms i.e. First Come First Served (FCFS), Last Job First (LJF) and Best Effort First (BEF) were used to evaluate the performance of the proposed algorithm. All four algorithms assumed to be non-preemptive. The parameters used for this experiment are the average response time, average completion time and the makespan time. The results obtained shows that, LCA scheduling algorithm perform moderately better than the other algorithms as the number of virtual machines increases.
Cloud computing is an on-demand service that can be accessed by a user according to his requirements through the Internet. Multiple users can request any amount of services, so scheduling of those services is a crucial task in cloud... more
Cloud computing is an on-demand service that can be accessed by a user according to his requirements through the Internet. Multiple users can request any amount of services, so scheduling of those services is a crucial task in cloud computing. Scheduling is a way of assigning the work to a computer resource. We have multiple tasks at a time that are waiting to be allotted to multiple computer resources. Various optimization algorithms have been used to do task scheduling so that total execution cost is minimized. In this paper, we have implemented Jaya optimization algorithm for workflow scheduling and have compared it with four nature-inspired algorithms, namely, particle swarm optimization (PSO), genetic algorithm (GA), ant colony optimization (ACO), honey bee, and cat swarm optimization (CSO), keeping the fitness function same for all of them using CloudSim. Previously, work has been done on PSO, GA, ACO, honey bee, and CSO using different criteria. The results are compared on the basis of execution cost and makespan of the algorithm on both an independent set of tasks and a set of tasks that follow a workflow schedule. Benchmark functions such as Montage, CyberShake, Inspiral, and Sipht are used for workflow scheduling. It has been observed that Jaya outperforms the other algorithms as it produces similar results in the least amount of time as it converges very quickly.
In the context of cloud computing and big data, the data of all walks of life has been obtained conveniently. Some information of users in the business process is in need of protection with the popularity of workflow applications , which... more
In the context of cloud computing and big data, the data of all walks of life has been obtained conveniently. Some information of users in the business process is in need of protection with the popularity of workflow applications , which will greatly affect the scheduling of workflow. Meanwhile, the amount of data is usually very large in workflow, the data privacy protection in workflow has also become an important research problem. In this paper, in order to satisfy the requirement of data privacy protection from user and minimize the total scheduling cost in workflow scheduling, we proposed a privacy and cost aware method based on genetic algorithm for data intensive workflow applications which takes into account computation cost, data transmission cost and data storage cost in cloud to solve this problem on finding the best scheduling solution. The proposed algorithm uses the summation of upward and downward rank values for prioritizing workflow tasks, then merges it to make an optimal initial population to obtain a good solution quickly. Besides, a series of operations like selection, crossover and mutation have been used to optimize the scheduling. In the workflow task scheduling, we assign the datacenter for tasks needing privacy protection, which data of these tasks cannot be moved or copied to other datacenter. Finally, we demonstrate the potential of proposed algorithm for optimizing economic cost with user privacy protection requirement. The experimental results show that proposed algorithm can help improve the scheduling and save the time and cost by an average of 3.6 % and 15.6 % respectively.
Cloud computing is an advanced technique involving networks of servers that run on a huge amount of data. These servers and other sources of the cloud can be located in wide remote areas. Different scientific and web applications are... more
Cloud computing is an advanced technique involving networks of servers that run on a huge amount of data. These servers and other sources of the cloud can be located in wide remote areas. Different scientific and web applications are representing by using Workflow models in the cloud. Scheduling different workflows in multi-cloud environment is a major issue of concern since they are quite huge and follow specific scientific standards. The need to meet user’s Quality of service (QoS) requirements are the other issues in public cloud computing, such as scalability and reliability and as well maximize the rate of resource utilization to end-users. This paper makes a comparison between three Particle Swarm Optimization (PSO) based algorithms in terms of makespan and cost. These algorithms were tested with the same number of virtual machines (VMs) and workflows. This is intended to help users to decide on which of these three algorithms can provide the required QoS for large scientific workflows in infrastructure as a service (IaaS) cloud platform as well as help them map tasks to resources. These algorithms are simulated on different simulation packages and tested with different scientific workflow datasets such as LIGO, Montage, CyberShake and Epigenome. The algorithms considered in this article can effectively distribute tasks to available resources for efficient optimization of makespan and cost. Simulation experiments reveal that ACO-PSO outperforms the basic PSO, C-PSO and PSO-DS in the same working environment.
Cloud computing is a type of Internet-based computing that provides computing resources to store manage and process data over internet. It is a computing platform that resides in a service provider‘s large data centre. It is a dynamic... more
Cloud computing is a type of Internet-based computing that provides computing resources to store manage and process data over internet. It is a computing platform that resides in a service provider‘s large data centre. It is a dynamic environment which provides the services typically Infrastructure as a Service, Software as a Service and Platform as a Service .These sservices are able to address a wide range of needs of clients. Workflow scheduling is a major factor that influences the performance of system in a cloud computing environment. The cloud service providers and consumers have different objectives and requirements. For the moment, the load and availability of the resources vary dynamically with time. Workflow scheduling discovers resources and allocates tasks on suitable resources. Workflow scheduling plays an important role in the workflow management. Scheduling problems belong to a broad class of optimization problem. It aimed at finding an optimal matching of tasks to different sets of resources. The primary objective of this work is to derive the Improved Particle swarm optimization approach for mapping the tasks to the computer resources such that the total cost is minimized.
Today we see a significantly increased use of problem-oriented approach to the development of cloud computing environment scheduling algorithms. There are already several such algorithms. However, a lot of these require that the tasks... more
Today we see a significantly increased use of problem-oriented approach to the development of cloud computing environment scheduling algorithms. There are already several such algorithms. However, a lot of these require that the tasks within a single job are independent and do not account for the execution of each task and the volume of data transmitted. We propose a model of problem-oriented cloud environment. Using this model, we propose a list-based algorithm of problem-oriented planning of execution of applications in a cloud environment that considers the applications' execution profiles, based on a Heterogeneous Earliest-Finish-Time (HEFT) algorithm.
Efficient task scheduling is one of the major steps for effectively harnessing the potential of cloud computing. In cloud computing, a number of tasks may need to be scheduled on different virtual machines in order to minimize makespan... more
Efficient task scheduling is one of the major steps for effectively harnessing the potential of cloud computing. In cloud computing, a number of tasks may need to be scheduled on different virtual machines in order to minimize makespan and increase system utilization. Task scheduling problem is NP-complete, hence finding an exact solution is intractable especially for large task sizes. This paper presents a Discrete Symbiotic Organism Search (DSOS) algorithm for optimal scheduling of tasks on cloud resources. Symbiotic Organism Search (SOS) is a newly developed metaheuristic optimization technique for solving numerical optimization problems. SOS mimics the symbiotic relationships (mutualism, commensalism, and parasitism) exhibited by organisms in an ecosystem. Simulation results revealed that DSOS outperforms Particle Swarm Optimization (PSO) which is one of the most popular heuristic optimization techniques used for task scheduling problems. DSOS converges faster when the search gets larger which makes it suitable for large-scale scheduling problems. Analysis of the proposed method conducted using t-test showed that DSOS performance is significantly better than that of PSO particularly for large search space.
" Problem-solving environments " recently became a widely accepted approach to providing computational resources to solve complex eScience problems. This approach represents a problem as a work-flow, orchestrating a set of various... more
" Problem-solving environments " recently became a widely accepted approach to providing computational resources to solve complex eScience problems. This approach represents a problem as a work-flow, orchestrating a set of various computational services. The existing cloud computing resources planning methods do not take into account a relation between such services, problem domain specifics, predicted work-flow execution timespan, etc. On the other hand, usage of cloud system provides efficient HPC resources usage, distributing tasks on the most suitable resources. Therefore, we need to develop algorithms that provide efficient cloud system resources usage and take into account domain-specific information of the problem.
Workflow scheduling involves mapping large tasks onto cloud resources to improve scheduling efficiency. This has attracted the interest of many researchers, who devoted their time and resources to improve the performance of scheduling in... more
Workflow scheduling involves mapping large tasks onto cloud resources to improve scheduling efficiency. This has attracted the interest of many researchers, who devoted their time and resources to improve the performance of scheduling in cloud computing. However, scientific workflows are big data applications, hence the executions are expensive and time consuming. In order to address this issue, we have extended our previous work ”Cost Optimised Heuristic Algorithm (COHA)” and presented a novel workflow scheduling algorithm named Multi-Objective Workflow Optimization Strategy (MOWOS) to jointly reduce execution cost and execution makespan. MOWOS employs tasks splitting mechanism to split large tasks into sub-tasks to reduce their scheduling length. Moreover, two new algorithms called MaxVM selection and MinVM selection are presented in MOWOS for task allocations. The design purpose of MOWOS is to enable all tasks to successfully meet their deadlines at a reduced time and budget. We ...
Cloud Computing is one of the emerging technologies which expand the boundary of internet by using the centralized servers to maintain data and resources. It enables the users and the consumers to use various applications... more
Cloud Computing is one of the emerging technologies which expand the boundary of internet by using the centralized servers to maintain data and resources. It enables the users and the consumers to use various applications provided by the cloud provider. But one of the major problems in front of this is workflow scheduling. The workflow scheduling is nothing but the scheduling algorithm, which is employed for the purpose of mapping the requests of users to the appropriate resources available. The scheduling of the workflow is typically performed manually by the IT staff. In other words, workflow scheduling is a kind of automation of the workflows, by using any algorithm.
Scheduling tasks with precedence constraints on a set of resources with different performances is a well-known NP-complete problem, and a number of effective heuristics has been proposed to solve it. If the start time and the deadline for... more
Scheduling tasks with precedence constraints on a set of resources with different performances is a well-known NP-complete problem, and a number of effective heuristics has been proposed to solve it. If the start time and the deadline for each specific workflow are known (for example, if a workflow starts execution according to periodic data coming from the sensors, and its execution should be completed before data acquisition), the problem of multiple deadline-constrained workflows scheduling arises. Taking into account that resource providers can give only restricted access to their computational capabilities, we consider the case when resources are partially available for workflow execution. To address the problem described above, we study the scheduling of deadline-constrained scientific wo rkflows in non-dedicated heterogeneous environment. In this paper, we introduce three scheduling algorithms for mapping the tasks of multiple workflows with different deadlines on the static set of resources with previously known free time windows. Simulation experiments show that scheduling strategies based on a proposed staged scheme give better results than merge-based approach considering all workflows at once.
Cloud computing is an on‐demand service that can be accessed by a user according to his requirements through the Internet. Multiple users can request any amount of services, so scheduling of those services is a crucial task in cloud... more
Cloud computing is an on‐demand service that can be accessed by a user according to his requirements through the Internet. Multiple users can request any amount of services, so scheduling of those services is a crucial task in cloud computing. Scheduling is a way of assigning the work to a computer resource. We have multiple tasks at a time that are waiting to be allotted to multiple computer resources. Various optimization algorithms have been used to do task scheduling so that total execution cost is minimized. In this paper, we have implemented Jaya optimization algorithm for workflow scheduling and have compared it with four nature‐inspired algorithms, namely, particle swarm optimization (PSO), genetic algorithm (GA), ant colony optimization (ACO), honey bee, and cat swarm optimization (CSO), keeping the fitness function same for all of them using CloudSim. Previously, work has been done on PSO, GA, ACO, honey bee, and CSO using different criteria. The results are compared on the basis of execution cost and makespan of the algorithm on both an independent set of tasks and a set of tasks that follow a workflow schedule. Benchmark functions such as Montage, CyberShake, Inspiral, and Sipht are used for workflow scheduling. It has been observed that Jaya outperforms the other algorithms as it produces similar results in the least amount of time as it converges very quickly.
Abstract—We solve workflow scheduling problems in e-Science networks, whose goal is minimizing either makespan or network resource consumption by jointly scheduling heterogeneous resources such as compute and network resources. We... more
Abstract—We solve workflow scheduling problems in e-Science networks, whose goal is minimizing either makespan or network resource consumption by jointly scheduling heterogeneous resources such as compute and network resources. We formulate the workflow scheduling problem incorporating multiple paths as a mixed integer linear programming (MILP) and develop several linear programming relaxation heuristics based on this formulation. Our algorithms allow dynamic multiple paths for data transfer between tasks and more flexible resource allocation that may vary over time. We evaluate our algorithms against a well-known list scheduling algorithm in e-Science networks whose size is relatively small. Our simulation results show that our heuristics are fast and work well when communication-to-computation ratios (CCRs) are small. Also, these results show that use of dynamic multiple paths and malleable resource allocation is useful for data intensive applications.
When orchestrating Web service workflows, the geographical placement of the orchestration engine(s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases... more
When orchestrating Web service workflows, the geographical placement of the orchestration engine(s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper, we present a framework that, given a DAG-based workflow specification, computes the op- timal Amazon EC2 cloud regions to deploy the orchestration engines and execute a workflow. The framework incorporates a constraint model that solves the workflow deployment problem, which is generated using an automated constraint modelling system. The feasibility of the framework is evaluated by executing different sample workflows representative of scientific workloads. The experimental results indicate that the framework reduces the workflow execution time and provides a speed up of 1.3x-2.5x over centralised approaches.