research-article

Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data

Authors:

Jianzhong LiAuthors Info & Claims

World Wide Web, Volume 26, Issue 4

Pages 1395 - 1433

https://doi.org/10.1007/s11280-022-01098-z

Published: 09 September 2022 Publication History

Abstract

Range query processing is of vital importance in array management area. How to achieve efficient range query evaluation is challenging on sparse multidimensional data in many applications. The range query performance is seriously affected by the dimension order utilized, such that it is highly needed to optimize the dimension order for the query performance. Prior works only focus on optimizing the global dimension order for the data. However, the data distribution and the query distribution on different parts of data may differ with each other. The global dimension order is too coarse-grained to achieve good query performance. It is essential to develop a fine-grained dimension order optimization. In this paper, to exploit the optimizing opportunities of fine-grained dimension ordering for range query processing, we first design a two-level linearization method for storing and querying the sparse multidimensional data. Different from previous works which usually use a global dimension order, the two-level linearization method allows to separately specify the dimension orders for different parts of data, named chunks. To achieve the effect of the fine-grained dimension order optimization, we present the chunk-oriented dimension ordering problem for the first time, and propose the workload-driven dimension ordering algorithms for the uniform case and the non-uniform independent case respectively. Furthermore, to cope with the changing workload in practical applications, a dynamic dimension reordering method is designed to trace query trends in time and avoid query performance degradation. Finally, experiments are constructed on both synthetic and real-life data to illustrate the effectiveness of our method.

References

[1]

Zhao, W., Rusu, F., Dong, B., Wu, K., Nugent, P.: Incremental view maintenance over array data. In: SIGMOD, pp. 139–154 (2017)

[2]

Xing, H., Agrawal, G.: Accelerating array joining with integrated value-index. In: SSDBM, pp. 145–156 (2019)

[3]

Choi D, Park C-S, and Chung YD Progressive top-k subarray query processing in array databases PVLDB 2019 12 9 989-1001

[4]

Rodriges Zalipynis RA Bitfun: fast answers to queries with tunable functions in geospatial array dbms PVLDB 2020 13 12 2909-2912

[5]

Baunsgaard, S., Boehm, M., Chaudhary, A., Derakhshan, B., Geißelsöder, S., Grulich, P.M., Hildebrand, M., Innerebner, K., Markl, V., Neubauer, C., et al.: Exdra: Exploratory data science on federated raw data. In: SIGMOD, pp. 2450–2463 (2021)

[6]

Guo, X., Li, T., Li, X., Zhao, H., Wang, S., Pang, C.: An efficient multidimensional

l_{\infty}

wavelet method and its application to approximate query processing. World Wide Web 24(1), 105–133 (2021)

[7]

Song X, Li J, Tang Y, Zhao T, Chen Y, and Guan Z Jkt: a joint graph convolutional network based deep knowledge tracing Inform. Sci. 2021 580 510-523

[8]

Song X, Li J, Lei Q, Zhao W, Chen Y, and Mian A Bi-clkt: Bi-graph contrastive learning based knowledge tracing Knowl.-Based Syst. 2022 241 108274

[9]

Mitra S, Banerjee S, and Naskar MK Remodelling correlation: a fault resilient technique of correlation sensitive stochastic designs Array 2022 15 100219

[10]

Fu X, Miao X, Xu J, and Gao Y Continuous range-based skyline queries in road networks World Wide Web 2017 20 6 1443-1467

[11]

Yin H, Gao H, Wang B, Li S, and Li J Efficient trajectory compression and range query processing World Wide Web 2022 25 3 1259-1285

[12]

Haldar, N.A.H., Li, J., Ali, M.E., Cai, T., Chen, Y., Sellis, T., Reynolds, M.: Top-k socio-spatial co-engaged location selection for social users. TKDE (2022)

[13]

Zhou W and Zhang H Correlation range query for effective recommendations World Wide Web 2015 18 3 709-729

[14]

Wang Y, Meliou A, and Miklau G Rc-index: Diversifying answers to range queries Proceedings of the VLDB Endowment 2018 11 7 773-786

[15]

Cui N, Yang X, Wang B, Geng J, and Li J Secure range query over encrypted data in outsourced environments World Wide Web 2020 23 1 491-517

[16]

Nagarkar P, Candan KS, and Bhat A Compressed spatial hierarchical bitmap (cshb) indexes for efficiently processing spatial range query workloads Proceedings of the VLDB Endowment 2015 8 12 1382-1393

[17]

Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)

[18]

Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: ICDE, pp 328–336. IEEE (1994)

[19]

Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)

[20]

Bian, H., Yan, Y., Tao, W., Chen, L.J., Chen, Y., Du, X., Moscibroda, T.: Wide table layout optimization based on column ordering and duplication. In: SIGMOD, pp. 299–314 (2017)

[21]

Marathe AP and Salem K Query processing techniques for arrays VLDBJ 2002 11 1 68-91

[22]

Papadopoulos S, Datta K, Madden S, and Mattson T The tiledb array data storage manager PVLDB 2016 10 4 349-360

[23]

Rodriges Zalipynis RA Chronosdb: distributed, file based, geospatial array dbms PVLDB 2018 11 10 1247-1261

[24]

Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system rasdaman. In: SIGMOD, pp. 575–577 (1998)

[25]

Van Ballegooij, A.R.: Ram: a multidimensional array dbms. In: EDBT, pp 154–165. Springer (2004)

[26]

Cornacchia R, Héman S, Zukowski M, de Vries AP, and Boncz P Flexible and efficient ir using array databases VLDBJ 2008 17 1 151-168

[27]

Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: SIGMOD, pp 963–968. ACM (2010)

[28]

Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of Scidb. In: SSDBM, pp. 1–16 (2011)

[29]

Soroush, E., Balazinska, M., Wang, D.: Arraystore: A storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)

[30]

Bentley JL Multidimensional binary search trees used for associative searching Commun. ACM 1975 18 9 509-517

[31]

Robinson, J.T.: The Kdb-Tree: A search structure for large multidimensional dynamic indexes. In: SIGMOD, pp. 10–18 (1981)

[32]

Samet H The quadtree and related hierarchical data structures ACM Computing Surveys (CSUR) 1984 16 2 187-260

[33]

Nievergelt J, Hinterberger H, and Sevcik KC The grid file: An adaptable, symmetric multikey file structure TODS 1984 9 1 38-71

[34]

Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, vol. 98, pp. 194–205 (1998)

[35]

Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the ub-tree into a database system kernel. In: VLDB, vol. 2000, pp 263–272. Citeseer (2000)

[36]

Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The nd-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In: Proceedings 2003 VLDB Conference, pp 620–631. Elsevier (2003)

[37]

Qian G, Zhu Q, Xue Q, and Pramanik S A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces TOIS 2006 24 1 79-110

[38]

Chen, C., Pramanik, S., Zhu, Q., Alok, W., Qian, G.: The C-Nd Tree: A multidimensional index for hybrid continuous and non-ordered discrete data spaces. In: EDBT, pp. 462–471 (2009)

[39]

Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

[40]

Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC, p 476. IEEE (1995)

[41]

Wu K, Otoo EJ, and Shoshani A Optimizing bitmap indices with efficient compression TODS 2006 31 1 1-38

[42]

Lemire D, Kaser O, and Aouiche K Sorting improves word-aligned bitmap indexes DKE 2010 69 1 3-28

[43]

Colantonio A and Di Pietro R Concise: Compressed ncomposable integer set IPL (Information Processing Letters) 2010 110 16 644-650

[44]

Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: ICDE, pp 484–495. IEEE (2014)

[45]

Chambi S, Lemire D, Kaser O, and Godin R Better bitmap performance with roaring bitmaps Software: Practice and Experience 2016 46 5 709-719

[46]

Zuo, W., Hou, X.: An improved probability propagation algorithm for density peak clustering based on natural nearest neighborhood. Array 100232 (2022)

[47]

Hoya T Reducing the number of centers in a probabilistic neural network via applying the first neighbor means clustering algorithm Array 2022 14 100161

[48]

Alshammari, M., Stavrakakis, J., Takatsuka, M.: A parameter-free graph reduction for spectral clustering and spectralnet. Array 100192 (2022)

[49]

Yuan C, Zhu Y, Zhong Z, Zheng W, and Zhu X Robust self-tuning multi-view clustering World Wide Web 2022 25 2 489-512

[50]

Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the cloud. In: International Andrei Ershov Memorial Conference on Perspectives of System Informatics, pp 337–351. Springer (2017)

[51]

Johnson, D., Krishnan, S., Chhugani, J., Kumar, S., Venkatasubramanian, S.: Compressing large boolean matrices using reordering techniques. In: PVLDB, pp. 13–23 (2004)

[52]

Lemire D and Kaser O Reordering columns for smaller indexes Inform. Sci. 2011 181 12 2550-2570

[53]

Pourabbas, E., Shoshani, A., Wu, K.: Minimizing index size by reordering rows and columns. In: SSDBM, pp 467–484. Springer (2012)

[54]

Shi, J.: Column partition and permutation for run length encoding in columnar databases. In: SIGMOD, pp. 2873–2874 (2020)

[55]

Cormode G, Garofalakis M, Haas PJ, Jermaine C, et al. Synopses for massive data: samples, histograms, wavelets, sketches Foundations and Trends® in Databases 2011 4 1–3 1-294

[56]

Li J, Rotem D, and Srivastava J Aggregation algorithms for very large compressed data warehouses PVLDB 1999 99 651-662

[57]

Otoo, E.J., Rotem, D., Seshadri, S.: Optimal chunking of large multidimensional arrays for data warehousing. In: DOLAP, pp 25–32. ACM (2007)

[58]

Nishimura, S., Yokota, H.: Quilts: Multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: SIGMOD, pp. 1525–1537 (2017)

[59]

Bader M Space-filling Curves: an Introduction with Applications in Scientific Computing, vol. 9 2012 Berlin Springer

[60]

Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. arXiv:1302.0103 (2013)

[61]

Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.: Titan: A high-performance remote-sensing database. In: ICDE, pp 375–384. IEEE (1997)

[62]

Hartmanis J Computers and intractability: a guide to the theory of np-completeness Siam Review 1982 24 1 90

[63]

Guard, U.C.: Vessel Traffic Data. https://marinecadastre.gov/ais/ (2020)

Recommendations

Spatial inverse query processing

Traditional spatial queries return, for a given query object q, all database objects that satisfy a given predicate, such as epsilon range and k-nearest neighbors. This paper defines and studies inverse spatial queries, which, given a subset of database ...
Efficient top-(k,l) range query processing for uncertain data based on multicore architectures

Query processing over uncertain data is very important in many applications due to the existence of uncertainty in real-world data. In this paper, we first elaborate a new and important query in the context of an uncertain database, namely uncertain top-...
BR-Tree: A Scalable Prototype for Supporting Multiple Queries of Multidimensional Data

Multidimensional data indexing has received much research attention recently in a centralized system. However, it remains a nascent area of research in providing an integrated structure for multiple queries on multidimensional data in a distributed ...

Comments

Information & Contributors

Information

Published In

cover image World Wide Web

World Wide Web Volume 26, Issue 4

Jul 2023

971 pages

ISSN:1386-145X

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 09 September 2022

Accepted: 29 August 2022

Revision received: 20 August 2022

Received: 09 May 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents