Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1316689.1316769dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Algebraic manipulation of scientific datasets

Published: 31 August 2004 Publication History

Abstract

We investigate algebraic processing strategies for large numeric datasets equipped with a possibly irregular grid structure. Such datasets arise, for example, in computational simulations, observation networks, medical imaging, and 2-D and 3-D rendering. Existing approaches for manipulating these datasets are incomplete: The performance of SQL queries for manipulating large numeric datasets is not competitive with specialized tools. Database extensions for processing multidimensional discrete data can only model regular, rectilinear grids. Visualization software libraries are designed to process gridded datasets efficiently, but no algebra has been developed to simplify their use and afford optimization. Further, these libraries are data dependent - physical changes to data representation or organization break user programs. In this paper, we present an algebra of grid-fields for manipulating both regular and irregular gridded datasets, algebraic optimization techniques, and an implementation backed by experimental results. We compare our techniques to those of spatial databases and visualization software libraries, using real examples from an Environmental Observation and Forecasting System. We find that our approach can express optimized plans inaccessible to other techniques, resulting in improved performance with reduced programming effort.

References

[1]
{1} A. Baptista, M. Wilkin, P. Pearson, P. Turner, M. C., and P. Barrett. Coastal and estuarine forecast systems: A multi-purpose infrastructure for the columbia river. Earth System Monitor, NOAA, 9(3), 1999.
[2]
{2} P. Baumann. A database array algebra for spatio-temporal data and beyond. In Next Generation Information Technologies and Systems, pages 76-93, 1999.
[3]
{3} G. Berti. Generic software components for Scientific Computing. PhD thesis, BTU Cottbus, Germany, 2000.
[4]
{4} S. Bhattacharya, C. Mohan, K. W. Brannon, I. Narang, H.-I. Hsiao, and M. Subramanian. Coordinating backup/recovery and data consistency between database and file systems. In SIGMOD, pages 500-511, 2002.
[5]
{5} D. M. Butler and S. Bryson. Vector-bundle classes form powerful tool for scientific visualization. Computers in Physics, 6(6):576-584, 1992.
[6]
{6} D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.-B. Yu. Client-Server Paradise. In VLDB, pages 558-569, Santiago, Chile, 1994.
[7]
{7} ESRI Corporation. ArcGIS: Working with geo-database topology. Technical report, ESRI, 2003.
[8]
{8} R. Haber, B. Lucas, and N. Collins. A data model for scientific visualization with provision for regular and irregular grids. In Visualization. IEEE Computer Society Press, 1991.
[9]
{9} B. Howe, D. Maier, and A. Baptista. A language for spatial data manipulation. Journal of Environmental Informatics, 2(2), December 2003.
[10]
{10} IBM Corporation. IBM Visualization Data Explorer User Guide, 4th edition, 1993.
[11]
{11} H. L. Jenter and R. P. Signell. Netcdf: A public-domain-software solution to data-access problems for numerical modelers. Unidata, 1992.
[12]
{12} L. Libkin, R. Machlin, and L. Wong. A query language for multidimensional arrays: design, implementation, and optimization techniques. In SIGMOD, pages 228-239, 1996.
[13]
{13} A. P. Marathe and K. Salem. A language for manipulating arrays. In VLDB, pages 46-55, 1997.
[14]
{14} J. Melton, J.-E. Michels, V. Josifovski, K. Kulkarni, P. Schwarz, and K. Zeidenstein. SQL and management of external data. SIGMOD Record, 30(1):70-77, 2001.
[15]
{15} P. Moran. Field model: An object-oriented data model for fields. Technical report, NASA Ames Research Center, 2001.
[16]
{16} R. Musick and T. Critchlow. Practical lessons in supporting large-scale computational science. SIGMOD Record, 28(4):49-57, 1999.
[17]
{17} M. Papiani, J. Wason, and D. A. Nicole. An architecture for management of large, distributed, scientific data using SQL/MED and XML. In EDBT, pages 447-461, 2000.
[18]
{18} P. J. Rhodes, R. D. Bergeron, and T. M. Sparr. Database support for multisource multiresolution scientific data. In SOFSEM, pages 94-114, 2002.
[19]
{19} W. J. Schroeder, K. M. Martin, and W. E. Lorensen. The design and implementation of an object-oriented toolkit for 3D graphics and visualization. In IEEE Visualization, pages 93-100, 1996.
[20]
{20} A. A. Stepanov and M. Lee. The Standard Template Library. Technical Report X3J16/94-0095, WG21/N0482, 1994.
[21]
{21} C. Stolte, D. Tang, and P. Hanrahan. Query, analysis, and visualization of multidimensional relational databases. In SIGKDD, pages 112-122, 2002.
[22]
{22} E. Stolte and G. Alonso. Efficient exploration of large scientific databases. In VLDB, pages 622-633, 2002.
[23]
{23} M. Stonebraker, L. A. Rowe, and M. Hirohama. The implementation of postgres. TKDE, 2(1):125-142, 1990.
[24]
{24} A. Thakar, P. Kunszt, A. Szalay, and J. Gray. The sdss science archive: Object vs relational implementations of a multi-tb astronomical database. Computers in Science and Engineering, 2002.
[25]
{25} P. Watson. Topology and ORDBMS technology. Technical report, Laser-Scan, 2002.
[26]
{26} N. Widmann and P. Baumann. Efficient execution of operations in a dbms for multidimensional arrays. In SSDBM, pages 155-165, 1998.

Cited By

View all
  • (2014)Toward unstructured mesh algebra and query languageProceedings of the 2014 SIGMOD PhD symposium10.1145/2602622.2602626(16-20)Online publication date: 18-Jun-2014
  • (2013)ImG-complexProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505733(1619-1624)Online publication date: 27-Oct-2013
  • (2010)Beyond rastersProceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/1869790.1869835(320-329)Online publication date: 2-Nov-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
August 2004
1380 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 31 August 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Toward unstructured mesh algebra and query languageProceedings of the 2014 SIGMOD PhD symposium10.1145/2602622.2602626(16-20)Online publication date: 18-Jun-2014
  • (2013)ImG-complexProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505733(1619-1624)Online publication date: 27-Oct-2013
  • (2010)Beyond rastersProceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/1869790.1869835(320-329)Online publication date: 2-Nov-2010
  • (2009)Towards integrated and efficient scientific sensor data processingProceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology10.1145/1516360.1516466(922-933)Online publication date: 24-Mar-2009
  • (2005)Querying and Visualizing Gridded Datasets for e-ScienceProceedings of the 21st International Conference on Data Engineering10.1109/ICDE.2005.117(1106-1107)Online publication date: 5-Apr-2005

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media