Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Extensible Parallel Query Processing for Exploratory Geoscientific Data Mining

Published: 01 October 2001 Publication History

Abstract

Exploratory data mining and analysis requires a computing environment which provides facilities for the user-friendly expression and rapid execution of “scientific queries.” In this paper, we address research issues in the parallelization of scientific queries containing complex user-defined operations. In a parallel query execution environment, parallelizing a query execution plan involves determining how input data streams to evaluators implementing logical operations can be divided to be processed by clones of the same evaluator in parallel. We introduced the concept of “relevance window” that characterizes data lineage and data partitioning opportunities available for an user-defined evaluator. In addition, we developed a query parallelization framework by extending relational parallel query optimization algorithms to allow the parallelization characteristics of user-defined evaluators to guide the process of query parallelization in an extensible query processing environment. We demonstrated the utility of our system by performing experiments mining cyclonic activity, blocking events, and the upward wave-energy propagation features from several observational and model simulation datasets.

References

[1]
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering.
[2]
DeWitt, D. 1996. Parallel object-relational database systems: Challenges and opportunities. In Proceedings of the 4th International Conference on Parallel and Distributed Information Systems.
[3]
Dole, R. M. 1983. Persistent anomalies of the extratropical Northern Hemisphere wintertime circulation. In Large-Scale Dynamical Processes in the Atmosphere. San Diego: Academic Press.
[4]
Freytag, J. 1987. A rule-based view of query optimization. In Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data, pp. 173-180.
[5]
Graefe, G. 1994. Volcano--an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120-135.
[6]
Hasan, W., Florescu, D., and Valduriez, P. 1996. Open issues in parallel query optimization. SIGMOD Record, 25(3).
[7]
Hong, W. and Stonebraker, M. 1993. Optimization of parallel query execution plans in XPRS. Distributed and Parallel Databases, 1(1):9-32.
[8]
Karpovich, J. F., Judd, M., Strayer, W. T., and Grimshaw, A. S. 1993. A parallel object-oriented framework for stencil algorithms. In Proceedings of the 2nd International Symposium on High Performance Distributed Computing, pp. 34-41.
[9]
Lanzelotte, R., Valduriez, P., and Zait, M. 1993. On the effectiveness of optimization search strategies for parallel execution space. In Proceedings of the 19th International Conference on Very Large Data Bases, pp. 493-504.
[10]
Leung, T. Y. C. and Muntz, R. R. 1990. Query processing in temporal databases. In Proceedings of the 6th International Conference on Data Engineering, pp. 200-208.
[11]
Leung, T. Y. C. and Muntz, R. R. 1992. Temporal query processing and optimization in multiprocessor database machines. In Proceedings of the 18th International Conference on Very Large Data Bases, pp. 383-394.
[12]
Libkin, L., Machlin, R., and Wong, L. 1996. A query language for multidimensional arrays: Design, implementation, and optimization techniques. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 228-239.
[13]
Marathe, A. P. and Salem, K. 1997. A language for manipulating arrays. In Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 46-55.
[14]
Mechoso, C. R., Lyons, S. W., and Spahr, J. A. 1990. The impact of sea surface temperature anomalies on the rainfall over northeast brazil. Journal Climate, 3:812-826.
[15]
Mehta, M. and DeWitt, D. 1995. Managing intra-operator parallelism in parallel database systems. In Proceedings of the 21st International Conference on Very Large Data Bases, pp. 382-394.
[16]
Mesrobian, E., Muntz, R. R., Santos, J. R., Shek, E. C., Mechoso, C. R., Farrara, J. D., and Stolorz, P. 1994. Extracting spatio-temporal patterns from geoscience datasets. In Proceedings of the IEEE Workshop on Visualization and Machine Vision, pp. 92-103.
[17]
Mesrobian, E., Muntz, R. R., Shek, E. C., Nittel, S., LaRouche, M., and Kriguer, M. 1996. OASIS: An open architecture scientific information system. In Proceedings of the 6th InternationalWorkshop on Research Issues in Data Engineering: Interoperability of Nontraditional Database Systems, pp. 107-116.
[18]
Nakamura, H. and Wallace, J. W. 1990. Observed changes in baroclinic wave activity during the life cycles of low-frequency circulation anomalies. Journal of Atmospheric Science, 47(9):1100-1116.
[19]
Rahm, E. and Marek, R. 1993. Analysis of dynamic load balancing strategies for parallel shared nothing database systems. In Proceedings of the 19th International Conference on Very Large Data Bases.
[20]
Ritter, G. X.,Wilson, J. N., and Davidson, J. L. 1990. Image algebra: An overview. Computer Vision, Graphics and Image Processing, 49(3):297-331.
[21]
Seshadri, P., Livny, M., and Ramakrishnan, R. 1994. Sequence query processing. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 430-441.
[22]
Shek, E. C., Mesrobian, E., and Muntz, R. R. 1996. On heterogeneous distributed geoscientific query processing. In Proceedings of the 6th International Workshop on Research Issues in Data Engineering: Interoperability of Nontraditional Database Systems. pp. 98-106.
[23]
Shek, E. C., Muntz, R. R., Mesrobian, E., and Ng, K. 1996. Scalable exploratory data mining of distributed geoscientific data. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.
[24]
Tomlin, C. D. 1990. Geographic Information Systems and Cartographic Modeling. Englewood Cliffs, NJ: Prentice-Hall.
[25]
Woodruff, A. and Stonebraker, M. 1997. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of the 13th International Conference on Data Engineering, pp. 91-102.

Cited By

View all
  • (2010)A framework for moving sensor data query and retrieval of dynamic atmospheric eventsProceedings of the 22nd international conference on Scientific and statistical database management10.5555/1876037.1876049(96-113)Online publication date: 30-Jun-2010

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 5, Issue 4
October 2001
108 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2001

Author Tags

  1. blocking events
  2. cyclone
  3. extensible user-defined operations
  4. geoscientific data mining
  5. parallel query processing
  6. upward wave-energy propagation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2010)A framework for moving sensor data query and retrieval of dynamic atmospheric eventsProceedings of the 22nd international conference on Scientific and statistical database management10.5555/1876037.1876049(96-113)Online publication date: 30-Jun-2010

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media