Due to the advances in human civilization, problems in science and engineering are becoming more complicated than ever before. To solve these complicated problems, grid computing becomes a popular tool. A grid environment collects,... more
Due to the advances in human civilization, problems in science and engineering are becoming more complicated than ever before. To solve these complicated problems, grid computing becomes a popular tool. A grid environment collects, integrates, and uses heterogeneous or homogeneous resources scattered around the globe by a high-speed network. A grid environment can be classified into two types: computing grids and data grids. This paper mainly focuses on computing grids.In computing grid, job scheduling is a very important task. A good scheduling algorithm can assign jobs to resources efficiently and can balance the system load.In this paper, we propose a hierarchical framework and a job scheduling algorithm called Hierarchical Load Balanced Algorithm (HLBA) for Grid environment. In our algorithm, we use the system load as a parameter in determining a balance threshold. And the scheduler adapts the balance threshold dynamically when the system load changes. The main contributions of this paper are twofold. First, the scheduling algorithm balances the system load with an adaptive threshold and second, it minimizes the makespan of jobs. Experimental results show that the performance of HLBA is better than those of other algorithms.► A hierarchical framework and a job scheduling algorithm for grid are proposed. ► The algorithm is called the Hierarchical Load Balanced Algorithm (HLBA). ► The main contributions are system load balancing and makespan minimization.
ABSTRACT The emergence of low-cost PC clusters together with the standardization of programming models (MPI and OpenMP) have paved the way for parallel computing to come into production use. In all domains of high performance computing,... more
ABSTRACT The emergence of low-cost PC clusters together with the standardization of programming models (MPI and OpenMP) have paved the way for parallel computing to come into production use. In all domains of high performance computing, parallel execution routinely is considered as one of the major sources of performance. In some domains, like computational .uid dynamics, commercial codes already o.er parallel execution as an option. In new domains, like bioinformatics, parallel execution is considered early in the design of algorithms and software. Besides clusters, grid computing is receiving increasing attention.
Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the... more
Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above.
In data grids, many distributed scientific and engineering applications often require access to a large amount of data (terabytes or petabytes). Data access time depends on bandwidth, especially in a cluster grid. Network bandwidth within... more
In data grids, many distributed scientific and engineering applications often require access to a large amount of data (terabytes or petabytes). Data access time depends on bandwidth, especially in a cluster grid. Network bandwidth within the same cluster is larger than across clusters. In a communication environment, the major bottleneck to supporting fast data access in Grids is the high latencies of Wide Area Networks (WANs) and Internet. Effective scheduling in such network architecture can reduce the amount of data transferred across the Internet by dispatching a job to where the needed data are present. Another solution is to use a data replication mechanism to generate multiple copies of the existing data to reduce access opportunities from a remote site. To utilize the above two concepts, in this paper we develop a job scheduling policy, called HCS (Hierarchical Cluster Scheduling), and a dynamic data replication strategy, called HRS (Hierarchical Replication Strategy), to improve the data access efficiencies in a cluster grid. We simulate our algorithm to evaluate various combinations of data access patterns. We also implement HCS and HRS in the Taiwan Unigrid environment. The simulation and experiment results show that HCS and HRS successfully reduces data access time and the amount of inter-cluster-communications in comparison with other strategies in a cluster grid.
Because of the dynamic and heterogeneous nature of a grid infrastructure, the client/server paradigm is a common programming model for these environments, where the client submits requests to several geographically remote servers for... more
Because of the dynamic and heterogeneous nature of a grid infrastructure, the client/server paradigm is a common programming model for these environments, where the client submits requests to several geographically remote servers for executing already deployed applications on its own data. According to this model, the applications are usually decomposed into independent tasks that are solved concurrently by the servers (the so called Data Grid applications). On the other hand, as many scientific applications are characterized by very large set of input data and dependencies among subproblems, avoiding unnecessary synchronizations and data transfer is a difficult task. This work addresses the problem of implementing a strategy for an efficient task scheduling and data management in case of data dependencies among subproblems in the same Linear Algebra application. For the purpose of the experiments, the NetSolve distributed computing environment has been used and some minor changes h...
With the regular progress of technology and infrastructures, a growing number of grid applications are developed and deployed for life science and medical research. At the last HealthGrid conference in April 2005 in Oxford, many groups... more
With the regular progress of technology and infrastructures, a growing number of grid applications are developed and deployed for life science and medical research. At the last HealthGrid conference in April 2005 in Oxford, many groups described successful usage of grids for compute intensive calculations. Very large scale deployment of a biomedical application in the area of drug discovery has been achieved on EGEE during 2005. On the other hand, beside a few pioneers, very few data grids have been deployed so far and knowledge grids are still at a conceptual level. This situation is expected to evolve quickly as many projects are focussed on developing data management services and knowledge management tools relevant to biomedical sciences. At this stage, it is important to identify the potential bottlenecks and to define a roadmap for the wide adoption of grids for healthcare. This article presents an analysis of the present adoption of grids for biomedical sciences and healthcare...
We report on the status of current technology in the fieldsof job submission and schedul- ing (workload management) in a world-wide data grid environment.
We review several aspects of building real-time streaming data Grid applications. Building on general purpose messaging system software (NaradaBrokering) and generalized collaboration services (GlobalMMCS), we are developing a diverse set... more
We review several aspects of building real-time streaming data Grid applications. Building on general purpose messaging system software (NaradaBrokering) and generalized collaboration services (GlobalMMCS), we are developing a diverse set of interoperable capabilities. These include dynamic information systems for managing short-lived collaborative service collections (“gaggles”), stream filters to support the integration of Geographical Information Systems services with data analysis applications, streaming ...
Data Grid environment seek to harness geographically distributed resources that deal with data-intensive problems such as those encountered in high energy physics, bio-informatics, and other disciplines. In general, grids enable the... more
Data Grid environment seek to harness geographically distributed resources that deal with data-intensive problems such as those encountered in high energy physics, bio-informatics, and other disciplines. In general, grids enable the efficient sharing and management of computing resources for the purpose of performing large complex tasks. To be able to sharing data, it is recommended to use the replication technique. This technique provides an improvement in performance, fault tolerance and load balancing. The replication management and its implementation are not simple tasks and produce other problems, like consistency management of replicas. One of the concerns major in the consistency management approaches called optimistic, it is the conflicts resolution among replicas. In this paper we present negotiation mechanisms based on the various negotiation forms between virtual consistency agents to be able to critical situation for sites and to reduce the number of conflicts among repl...
A preservation environment manages communication from the past while communicating with the future. Information generated in the past is sent into the future by the current preservation environment. The proof that the preservation... more
A preservation environment manages communication from the past while communicating with the future. Information generated in the past is sent into the future by the current preservation environment. The proof that the preservation environment preserves authenticity and integrity while performing the communication constitutes a theory of digital preservation. We examine the representation information that is needed about the preservation environment for a theory of digital preservation. The representation information includes descriptions of the preservation management policies, the preservation processes, and the state information that is needed to verify the correct working behavior of the system. We demonstrate rule-based data grids that can verify that prior policies correctly enforced preservation properties, while sending into the future descriptions of the current preservation management policies.