Abstract
Data archiving has been commonly used in many fields for data backup and analysis purposes. Although comprehensive application software, new computing and storage technologies, and the Internet have made it easier to create, collect and store all types of data, the meaningful storing, accessing, and managing of database archives in a cost-effective way remains extremely challenging. In this paper, we focus on hierarchical data archiving that has been popularly used in the scientific field and web data management. First, we propose a novel compaction scheme for archiving hierarchical data. By compacting both data and timestamps, our scheme substantially reduces not only the amount of needed storage, but also the incremental archiving time. Second, we design a temporal query language to support data retrieval from the compact data archives. Third, as compaction on data and timestamps may bring significant overhead to query evaluation, we investigate how to optimize such overhead by exploiting the characteristics of the queries and of the archived hierarchical data. Finally, we conduct an extensive experimentation to demonstrate the effectiveness and efficiency of both our efficient storage and query optimization techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
JDOM XML parser, http://www.jdom.org
Wutka DTD parser, http://www.wutka.com/dtdparser.html
IBM XML data generator, http://www.alphaworks.ibm.com/tech/xmlgenerator
XMark XML benchmark project, http://monetdb.cwi.nl/xml/
XML Data Repository of University of Washington, http://www.cs.washington.edu/research/xmldatasets/www/repository.html
Annis, J., Zhao, Y., Vockler, J.-S., Wilde, M., Kent, S., Foster, I.T.: Applying chimera virtual data concepts to cluster finding in the sloan sky survey. In: Supercomputing (2002)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Buneman, P., Khanna, S., Tajima, K., Tan, W.-C.: Archiving scientific data. ACM Transactions on Database Systems (2004)
Chapman, A.P., Jagadish, H., Ramanan, P.: Efficient provenance storage. In: SIGMOD (2008)
Chawathe, S., Garcia-molina, H.: Meaningful change detection in structured data. In: SIGMOD (1997)
Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. In: SIGMOD (1996)
Chien, S.-Y., Tsotras, V.J., Zaniolo, C., Zhang, D.: Supporting complex queries on multiversion xml documents. ACM Transactions on Internet Technology (2006)
Cobena, G., Abiteboul, S., Marian, A.: Detecting changes in xml documents. In: ICDE (2002)
Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: A survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)
Groth, P., Miles, S., Fang, W., Wong, S.C., peter Zauner, K., Moreau., L.: Recording and using provenance in a protein compressibility experiment. In: HPDC (2005)
Jayant, P.T., Haritsa, J.R.: Xgrind: A query-friendly xml compressor. In: ICDE (2002)
Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: VLDB (2003)
Koltsidas, I., Muller, H., Viglas, S.D.: Sorting hierarchical data in external memory for archiving. In: PVLDB (2008)
Liefke, H., Suciu, D.: XMill: an efficient compressor for XML data. In: SIGMOD (1999)
Marian, A., Abiteboul, S., Mignet, L.: Change-centric management of versions in an xml warehouse. In: VLDB (2001)
Müller, H., Buneman, P., Koltsidas, I.: Xarch: Archiving scientific and reference data. In: SIGMOD (2008)
Pancerella, C., Myers, J.D., Allison, T.C., Amin, K., Bittner, R., Frenklach, M., Green, W.H., ling Ho, Y., Hewson, J., Koegler, W., Yang, C.: Metadata in the collaboratory for multi-scale chemical science. In: Dublin Core Conference (2003)
Rizzolo, F., Vaisman, A.A.: Temporal xml: modeling, indexing, and query processing. The VLDB Journal 17, 1179–1212 (2008)
Tichy, W.F.: RCS - a system for version control. Software-Practice & Experience (1985)
Wang, F., Zaniolo, C.: Temporal queries in XML document archives and web warehouses. In: TIME-ICTL (2003)
Wang, F., Zaniolo, C.: Temporal queries and version management in XML-based document archives. Data Knowl. Eng. 65, 304–324 (2008)
Wang, Y., DeWitt, D.J., yi Cai, J.: X-Diff: An effective change detection algorithm for XML documents. In: ICDE (2003)
Wong, R., Lam, N.: Managing and querying multi-version xml data with update logging. In: DocEng. (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H.(., Liu, R., Theodoratos, D., Wu, X. (2011). Efficient Storage and Temporal Query Evaluation in Hierarchical Data Archiving Systems. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-22351-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)