Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options

Published: 01 January 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Writing parallel programs that can take advantage of non-dedicated processors is much more difficult than writing such programs for networks of dedicated processors. In a non-dedicated environment such programs must use autonomic techniques to respond to the unpredictable load fluctuations that prevail in the computational environment. In adaptive query processing (AQP), several techniques have been proposed for dynamically redistributing processor load assignments throughout a computation to take account of varying resource capabilities, but we know of no previous study that compares their performance. This paper presents a simulation-based evaluation of these autonomic parallelization techniques in a uniform environment and compares how well they improve the performance of the computation. Four published strategies are compared with a new algorithm that seeks to overcome some weaknesses identified in the existing approaches. In addition, we explore the use of techniques from online algorithms to provide a firm foundation for determining when to adapt in two of the existing algorithms. The evaluations identify situations in which each strategy may be used effectively and in which it should be avoided.

    References

    [1]
    Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the grid. In: Proc. 1st ICSOC, pp. 467-482. Springer, Heidelberg (2003).
    [2]
    Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: ACM SIGMOD, pp. 261-272 (2000).
    [3]
    Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: Proc. ACM SIGMOD, pp. 107-118 (2005).
    [4]
    Braumandl, R., Keidl, M., Kemper, A., Kossmann, D., Kreutz, A., Seltzsam, S., Stocker, K.: ObjectGlobe: ubiquitous query processing on the internet. VLDB J. 10(1), 48-71 (2001).
    [5]
    Chaudhuri, S., Narasayya, V.R., Ramamurthy, R.: Estimating progress of long running SQL queries. In: Proc. SIGMOD, pp. 803- 814 (2004).
    [6]
    Davidson, S.B., Crabtree, J., Brunk, B.P., Schug, J., Tannen, V., Overton, G.C., Stoeckert, C.J.:K2/Kleisli and GUS: experiments in integrated access to genomic data sources. IBM Syst. J. 40(2), 512- 531 (2001).
    [7]
    DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: Proc. VLDB, pp. 27-40 (1992).
    [8]
    DeWitt, D.J.: Parallel database systems: the future of high performance database systems. Comm. ACM. 35(6), 85-98 (1992).
    [9]
    Eggers, S.J., Katz, R.H.: Evaluating the performance of four cache coherency protocols. In: Proceedings of the 16th International Symposium on Computer Architecture, pp. 2-15 (1989).
    [10]
    Ewen, S., Kache, H., Markl, V., Raman, V.: Progressive query optimization for federated queries. In: Proc. 10th EDBT, pp. 847-864. Springer, Heidelberg (2006).
    [11]
    Fiat, A. et al.: Competitive paging algorithms. J. Algorithms 12, 685-699 (1991).
    [12]
    Gounaris, A., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Self-monitoring query execution for adaptive query processing Data Knowl. Eng. 51(3), 325-348 (2004).
    [13]
    Gounaris, A., Smith, J., Paton, N.W., Sakellariou, R., Fernandes, A.A.A.: Adapting to changing resources in grid query processing. In: Proc. 1st International Workshop on Data Management in Grids, pp. 30-44. Springer, Heidelberg (2005).
    [14]
    Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: Proc. SIGMOD, pp. 102-111 (1990).
    [15]
    Graefe, G.: Iterators, schedulers, and distributed memory parallelism. Softw. Pract. Exp. 26(4), 427-452 (1996).
    [16]
    Huebsch, R., Hellerstein, J.M., Lanham, N., Thau Loo, B., Shenker, S., Stoica, I.: Querying the internet with pier. In: VLDB, pp. 321-332 (2003).
    [17]
    Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in data integration queries. In: Proc. SIGMOD, pp. 395-406 (2004).
    [18]
    Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: Proc. SIGMOD, pp. 524-532 (2002).
    [19]
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422-469 (2000).
    [20]
    Lilja, D.J.: Measuring Computer Performance. Cambridge University Press, London (2000).
    [21]
    Liu, D.T., Franklin, M.J.: GridDB: a data-centric overlay for scientific grids. In: Proc. VLDB, pp. 600-611. Morgan-Kaufmann (2004).
    [22]
    Luo, G., Naughton, J.F., Ellmann, C., Watzke, M.: Toward a progress indicator for database queries. In: Proc. ACM SIGMOD, pp. 791-802 (2004).
    [23]
    Manasse, M.S., McGeoch, L.A., Sleator, D.D.: Competitive algorithms for on-line problems. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 322-333 (1988).
    [24]
    Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proc. ACM SIGMOD, pp. 659-670 (2004).
    [25]
    Narayanan, S., Kurc, T.M., Saltz, J.: Database support for data-driven scientific applications in the grid. Parallel Process. Lett. 13(2), 245-271 (2003).
    [26]
    Paton, N.W., Raman, V., Swart, G., Narang, I.: Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options. In: Proc. 3rd Intl. Conference on Autonomic Computing, pp. 221-230. IEEE Press (2006).
    [27]
    Rahm, E., Marek, R.: Analysis of dynamic load balancing strategies for parallel shared nothing database systems. In: Proc. VLDB, pp. 182-193 (1993).
    [28]
    Raman, V., Han, W., Narang, I.: Parallel querying with nondedicated computers. In: Proc. VLDB, pp. 61-72 (2005).
    [29]
    Romer, T.H., Ohlrich, W.H., Karlin, A.R., Bershad, B.N.: Reducing TLB and memory overhead using online superpage promotion. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 176-187 (1995).
    [30]
    Sampaio, S., Paton, N.W., Smith, J., Watson, P.: Measuring and modelling the performance of a parallel ODMG compliant object database server. Concurr. Pract. Exp. 18(1), 63-109 (2006).
    [31]
    Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly available fault-tolerant, parallel dataflows. In: Proc. SIGMOD, pp. 827-838 (2004).
    [32]
    Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., Franklin, M.J.: Flux: an adaptive partitioning operator for continuous query systems. In: Proc. ICDE, pp. 353-364. IEEE Press (2003).
    [33]
    Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. Intl. J. High Perform. Comput. Appl. 17(4), 353-368 (2003).
    [34]
    Smith, J., Watson, P.: Fault-tolerance in distributed query processing. In: Proc. IDEAS, pp. 329-338. IEEE Press (2005).
    [35]
    Swart, G.: Spreading the load using consistent hashing: a preliminary report. In: 3rd Int. Symp. on Parallel and Distributed Computing, pp. 169-176. IEEE Press (2004).
    [36]
    Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proc VLDB, pp. 333-344 (2004).
    [37]
    Urhan, T., Franklin, M.J.: XJoin: a reactively-scheduled pipelined join operator. Data Eng. Bull. 23(2), 27-33 (2000).
    [38]
    Yellin, D.M.: Competitive algorithms for the dynamic selection of component implementations. IBM Syst. J. 42(1), 85-97 (2003).
    [39]
    Zhou, Y., Ooi, B.C., Tan, K.-L., Tok, W.H.: an adaptable distributed query processing architecture. Data Knowl. Eng. 53(3), 283- 309 (2005).

    Cited By

    View all
    • (2016)Optimizing virtual machine placement for energy and SLA in clouds using utility functionsJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-016-0067-75:1(1-17)Online publication date: 1-Dec-2016
    • (2015)Fault-tolerant resource allocation for query processing in grid environmentsInternational Journal of Web and Grid Services10.1504/IJWGS.2015.06889511:2(143-159)Online publication date: 1-Apr-2015
    • (2014)Scalable and adaptive online joinsProceedings of the VLDB Endowment10.14778/2732279.27322817:6(441-452)Online publication date: 1-Feb-2014
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image The VLDB Journal — The International Journal on Very Large Data Bases
    The VLDB Journal — The International Journal on Very Large Data Bases  Volume 18, Issue 1
    January 2009
    373 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 January 2009

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Optimizing virtual machine placement for energy and SLA in clouds using utility functionsJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-016-0067-75:1(1-17)Online publication date: 1-Dec-2016
    • (2015)Fault-tolerant resource allocation for query processing in grid environmentsInternational Journal of Web and Grid Services10.1504/IJWGS.2015.06889511:2(143-159)Online publication date: 1-Apr-2015
    • (2014)Scalable and adaptive online joinsProceedings of the VLDB Endowment10.14778/2732279.27322817:6(441-452)Online publication date: 1-Feb-2014
    • (2012)Balancing reducer skew in MapReduce workloads using progressive samplingProceedings of the Third ACM Symposium on Cloud Computing10.1145/2391229.2391245(1-14)Online publication date: 14-Oct-2012
    • (2012)Efficient load balancing in partitioned queries under random perturbationsACM Transactions on Autonomous and Adaptive Systems10.1145/2168260.21682657:1(1-27)Online publication date: 4-May-2012
    • (2011)An efficient skew-insensitive algorithm for join processing on grid architecturesProceedings of the fifth international workshop on High-level parallel programming and applications10.1145/2034751.2034756(11-18)Online publication date: 18-Sep-2011
    • (2009)Adaptive workload allocation in query processing in autonomous heterogeneous environmentsDistributed and Parallel Databases10.1007/s10619-008-7032-525:3(125-164)Online publication date: 1-Jun-2009

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media