Abstract
Data-intensive applications executing over a computational grid demand large data transfers. These are costly operations. Therefore, taking them into account is mandatory to achieve efficient scheduling of data-intensive applications on grids. Further, within a heterogeneous and ever changing environment such as a grid, better schedules are typically attained by heuristics that use dynamic information about the grid and the applications. However, this information is often difficult to be accurately obtained. On the other hand, although there are schedulers that attain good performance without requiring dynamic information, they were not designed to take data transfer into account. This paper presents Storage Affinity, a novel scheduling heuristic for bag-of-tasks data-intensive applications running on grid environments. Storage Affinity exploits a data reuse pattern, common on many data-intensive applications, that allows it to take data transfer delays into account and reduce the makespan of the application. Further, it uses a replication strategy that yields efficient schedules without relying upon dynamic information that is difficult to obtain. Our results show that Storage Affinity may attain better performance than the state-of-the-art knowledge-dependent schedulers. This is achieved at the expense of consuming more CPU cycles and network bandwidth.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lyman, P.: Hal R. Varian, J. Dunn, A. Strygin and K. Swearingen. How much information ? (October 2003), http://www.sims.berkeley.edu/research/projects/how-muchinfo-2003
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 1(215), 403–410 (1990)
GriPhyN Group, http://www.GriPhyN.org (2002), http://www.GriPhyN.org
Santos-Neto, E.L., Tenório, L.E.F., Fonseca, E.J.S., Cavalcanti, S.B., Hickmann, J.M.: Parallel Visualization of the optical pulse through a doped optical fiber. In: Proceedings of Annual Meeting of the Division of Computational Physics (abstract) (June 2001)
Cirne, W., Paranhos, D., Costa, L., Santos-Neto, E., Brasileiro, F., Sauvé, J., da Silva, F.A.B., Barros, C.O., Silveira, C.: Running Bag-of-Tasks Applications on Computational Grids: The MyGrid Approach. In: Proceedings of the ICCP 2003 - International Conference on Parallel Processing (October 2003)
Smith, J., Shrivastava, S.K.: A System for Fault-Tolerant Execution of Data and Compute Intensive Programs over a Network of Workstations. In: Fraigniaud, P., Mignotte, A., Bougé, L., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 487–495. Springer, Heidelberg (1996)
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a Future Computing Infrastructure (1999)
Beaumont, O., Carter, L., Ferrante, J., Robert, Y.: Bandwidth-centric Allocation of Independent Task on Heterogeneous Plataforms. In: Proceedings of the Internetional Parallel and Distributed Processing Symposium (April 2002)
Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for Scheduling Parameter Sweep Applications in Grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (May 2000)
Faerman, M., Su, A., Wolski, R., Berman, F.: Adaptive Performance Prediction for Distributed Data-Intensive Applications. In: Proceedings of the ACM/IEEE SC 1999 Conference on High Performance Networking and Computing (1999)
Marzullo, K., Ogg, M., Ricciardi, A., Amoroso, A., Calkins, A., Rothfus, E.: NILE: Wide-Area Computing for High Energy Physics. In: Proceedings 7th ACM European Operating Systems Principles Conference. System Support for Worldwide Applications (September 1996)
Paranhos, D., Cirne, W., Brasileiro, F.: Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 169–180. Springer, Heidelberg (2003)
Kedem, Z.M., Palem, K.V., Spirakis, P.G.: Efficient Robust Parallel Computations (Extended Abstract). In: Proceedings of ACM Symposium on Theory of Computing (1990)
Pinedo, M.: Scheduling: Theory, Algorithms and Systems, 2nd edn. (August 2001)
Downey, A.: Predicting queue times on space-sharing parallel computers. In: Proceedings of 11th International Parallel Processing Symposium (IPPS 1997) (April 1997)
Gibbons, R.: A Historical Application Profiler for Use by Parallel Schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)
Smith, W., Foster, I., Taylor, V.: Predicting Application Run Times Using Historical Information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)
Wolski, R., Spring, N., Hayes, J.: Predicting the CPU Availability of Time-shared Unix Systems on the Computational Grid. In: Proceedings of 8th International Symposium on High Performance Distributed Computing (HPDC 1999) (August 1999)
Francis, P., Jamin, S., Paxson, V., Zhang, L., Gryniewicz, D.F., Jim, Y.: An Architecture for a Global Internet Host Distance Estimation Service. In: Proceedings of IEEE INFOCOM (1999)
Ibarra, O.H., Kim, C.E.: Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors. Journal of the ACM (JACM) 24(2), 280–289 (1977)
Feitelson, D., Rudolph, L.: Metrics and Benchmarking for Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)
Feitelson, D.G.: Metric and workload effects on computer systems evaluation. Computer 36(9), 18–25 (2003)
Lo, V., Mache, J., Windisch, K.: A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 25–46. Springer, Heidelberg (1998)
Wolski, R., Spring, N.T., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. Future Generation Computer Systems 15(5-6), 757–768 (1999)
Casanova, H.: Simgrid: A Toolkit for the Simulation of Application Scheduling. In: Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid (May 2001)
Devore, J.L.: Probability and Statistics for Engineering and The Sciences, vol. 1 (2000)
MyGrid Site (2004), http://www.ourgrid.org/mygrid
BLAST Webpage, http://www.ncbi.nlm.nih.giv/BLAST
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: OceanStore: An Architecture for Global-Scale Persistent Storage. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (November 2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Santos-Neto, E., Cirne, W., Brasileiro, F., Lima, A. (2005). Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2004. Lecture Notes in Computer Science, vol 3277. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11407522_12
Download citation
DOI: https://doi.org/10.1007/11407522_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25330-3
Online ISBN: 978-3-540-31795-1
eBook Packages: Computer ScienceComputer Science (R0)