Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional data

Published: 09 September 2022 Publication History

Abstract

Range query processing is of vital importance in array management area. How to achieve efficient range query evaluation is challenging on sparse multidimensional data in many applications. The range query performance is seriously affected by the dimension order utilized, such that it is highly needed to optimize the dimension order for the query performance. Prior works only focus on optimizing the global dimension order for the data. However, the data distribution and the query distribution on different parts of data may differ with each other. The global dimension order is too coarse-grained to achieve good query performance. It is essential to develop a fine-grained dimension order optimization. In this paper, to exploit the optimizing opportunities of fine-grained dimension ordering for range query processing, we first design a two-level linearization method for storing and querying the sparse multidimensional data. Different from previous works which usually use a global dimension order, the two-level linearization method allows to separately specify the dimension orders for different parts of data, named chunks. To achieve the effect of the fine-grained dimension order optimization, we present the chunk-oriented dimension ordering problem for the first time, and propose the workload-driven dimension ordering algorithms for the uniform case and the non-uniform independent case respectively. Furthermore, to cope with the changing workload in practical applications, a dynamic dimension reordering method is designed to trace query trends in time and avoid query performance degradation. Finally, experiments are constructed on both synthetic and real-life data to illustrate the effectiveness of our method.

References

[1]
Zhao, W., Rusu, F., Dong, B., Wu, K., Nugent, P.: Incremental view maintenance over array data. In: SIGMOD, pp. 139–154 (2017)
[2]
Xing, H., Agrawal, G.: Accelerating array joining with integrated value-index. In: SSDBM, pp. 145–156 (2019)
[3]
Choi D, Park C-S, and Chung YD Progressive top-k subarray query processing in array databases PVLDB 2019 12 9 989-1001
[4]
Rodriges Zalipynis RA Bitfun: fast answers to queries with tunable functions in geospatial array dbms PVLDB 2020 13 12 2909-2912
[5]
Baunsgaard, S., Boehm, M., Chaudhary, A., Derakhshan, B., Geißelsöder, S., Grulich, P.M., Hildebrand, M., Innerebner, K., Markl, V., Neubauer, C., et al.: Exdra: Exploratory data science on federated raw data. In: SIGMOD, pp. 2450–2463 (2021)
[6]
Guo, X., Li, T., Li, X., Zhao, H., Wang, S., Pang, C.: An efficient multidimensional l wavelet method and its application to approximate query processing. World Wide Web 24(1), 105–133 (2021)
[7]
Song X, Li J, Tang Y, Zhao T, Chen Y, and Guan Z Jkt: a joint graph convolutional network based deep knowledge tracing Inform. Sci. 2021 580 510-523
[8]
Song X, Li J, Lei Q, Zhao W, Chen Y, and Mian A Bi-clkt: Bi-graph contrastive learning based knowledge tracing Knowl.-Based Syst. 2022 241 108274
[9]
Mitra S, Banerjee S, and Naskar MK Remodelling correlation: a fault resilient technique of correlation sensitive stochastic designs Array 2022 15 100219
[10]
Fu X, Miao X, Xu J, and Gao Y Continuous range-based skyline queries in road networks World Wide Web 2017 20 6 1443-1467
[11]
Yin H, Gao H, Wang B, Li S, and Li J Efficient trajectory compression and range query processing World Wide Web 2022 25 3 1259-1285
[12]
Haldar, N.A.H., Li, J., Ali, M.E., Cai, T., Chen, Y., Sellis, T., Reynolds, M.: Top-k socio-spatial co-engaged location selection for social users. TKDE (2022)
[13]
Zhou W and Zhang H Correlation range query for effective recommendations World Wide Web 2015 18 3 709-729
[14]
Wang Y, Meliou A, and Miklau G Rc-index: Diversifying answers to range queries Proceedings of the VLDB Endowment 2018 11 7 773-786
[15]
Cui N, Yang X, Wang B, Geng J, and Li J Secure range query over encrypted data in outsourced environments World Wide Web 2020 23 1 491-517
[16]
Nagarkar P, Candan KS, and Bhat A Compressed spatial hierarchical bitmap (cshb) indexes for efficiently processing spatial range query workloads Proceedings of the VLDB Endowment 2015 8 12 1382-1393
[17]
Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)
[18]
Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: ICDE, pp 328–336. IEEE (1994)
[19]
Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)
[20]
Bian, H., Yan, Y., Tao, W., Chen, L.J., Chen, Y., Du, X., Moscibroda, T.: Wide table layout optimization based on column ordering and duplication. In: SIGMOD, pp. 299–314 (2017)
[21]
Marathe AP and Salem K Query processing techniques for arrays VLDBJ 2002 11 1 68-91
[22]
Papadopoulos S, Datta K, Madden S, and Mattson T The tiledb array data storage manager PVLDB 2016 10 4 349-360
[23]
Rodriges Zalipynis RA Chronosdb: distributed, file based, geospatial array dbms PVLDB 2018 11 10 1247-1261
[24]
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system rasdaman. In: SIGMOD, pp. 575–577 (1998)
[25]
Van Ballegooij, A.R.: Ram: a multidimensional array dbms. In: EDBT, pp 154–165. Springer (2004)
[26]
Cornacchia R, Héman S, Zukowski M, de Vries AP, and Boncz P Flexible and efficient ir using array databases VLDBJ 2008 17 1 151-168
[27]
Brown, P.G.: Overview of scidb: large scale array storage, processing and analysis. In: SIGMOD, pp 963–968. ACM (2010)
[28]
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of Scidb. In: SSDBM, pp. 1–16 (2011)
[29]
Soroush, E., Balazinska, M., Wang, D.: Arraystore: A storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)
[30]
Bentley JL Multidimensional binary search trees used for associative searching Commun. ACM 1975 18 9 509-517
[31]
Robinson, J.T.: The Kdb-Tree: A search structure for large multidimensional dynamic indexes. In: SIGMOD, pp. 10–18 (1981)
[32]
Samet H The quadtree and related hierarchical data structures ACM Computing Surveys (CSUR) 1984 16 2 187-260
[33]
Nievergelt J, Hinterberger H, and Sevcik KC The grid file: An adaptable, symmetric multikey file structure TODS 1984 9 1 38-71
[34]
Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, vol. 98, pp. 194–205 (1998)
[35]
Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the ub-tree into a database system kernel. In: VLDB, vol. 2000, pp 263–272. Citeseer (2000)
[36]
Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The nd-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In: Proceedings 2003 VLDB Conference, pp 620–631. Elsevier (2003)
[37]
Qian G, Zhu Q, Xue Q, and Pramanik S A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces TOIS 2006 24 1 79-110
[38]
Chen, C., Pramanik, S., Zhu, Q., Alok, W., Qian, G.: The C-Nd Tree: A multidimensional index for hybrid continuous and non-ordered discrete data spaces. In: EDBT, pp. 462–471 (2009)
[39]
Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
[40]
Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC, p 476. IEEE (1995)
[41]
Wu K, Otoo EJ, and Shoshani A Optimizing bitmap indices with efficient compression TODS 2006 31 1 1-38
[42]
Lemire D, Kaser O, and Aouiche K Sorting improves word-aligned bitmap indexes DKE 2010 69 1 3-28
[43]
Colantonio A and Di Pietro R Concise: Compressed ncomposable integer set IPL (Information Processing Letters) 2010 110 16 644-650
[44]
Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: ICDE, pp 484–495. IEEE (2014)
[45]
Chambi S, Lemire D, Kaser O, and Godin R Better bitmap performance with roaring bitmaps Software: Practice and Experience 2016 46 5 709-719
[46]
Zuo, W., Hou, X.: An improved probability propagation algorithm for density peak clustering based on natural nearest neighborhood. Array 100232 (2022)
[47]
Hoya T Reducing the number of centers in a probabilistic neural network via applying the first neighbor means clustering algorithm Array 2022 14 100161
[48]
Alshammari, M., Stavrakakis, J., Takatsuka, M.: A parameter-free graph reduction for spectral clustering and spectralnet. Array 100192 (2022)
[49]
Yuan C, Zhu Y, Zhong Z, Zheng W, and Zhu X Robust self-tuning multi-view clustering World Wide Web 2022 25 2 489-512
[50]
Rodriges Zalipynis, R.A.: Distributed in situ processing of big raster data in the cloud. In: International Andrei Ershov Memorial Conference on Perspectives of System Informatics, pp 337–351. Springer (2017)
[51]
Johnson, D., Krishnan, S., Chhugani, J., Kumar, S., Venkatasubramanian, S.: Compressing large boolean matrices using reordering techniques. In: PVLDB, pp. 13–23 (2004)
[52]
Lemire D and Kaser O Reordering columns for smaller indexes Inform. Sci. 2011 181 12 2550-2570
[53]
Pourabbas, E., Shoshani, A., Wu, K.: Minimizing index size by reordering rows and columns. In: SSDBM, pp 467–484. Springer (2012)
[54]
Shi, J.: Column partition and permutation for run length encoding in columnar databases. In: SIGMOD, pp. 2873–2874 (2020)
[55]
Cormode G, Garofalakis M, Haas PJ, Jermaine C, et al. Synopses for massive data: samples, histograms, wavelets, sketches Foundations and Trends® in Databases 2011 4 1–3 1-294
[56]
Li J, Rotem D, and Srivastava J Aggregation algorithms for very large compressed data warehouses PVLDB 1999 99 651-662
[57]
Otoo, E.J., Rotem, D., Seshadri, S.: Optimal chunking of large multidimensional arrays for data warehousing. In: DOLAP, pp 25–32. ACM (2007)
[58]
Nishimura, S., Yokota, H.: Quilts: Multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: SIGMOD, pp. 1525–1537 (2017)
[59]
Bader M Space-filling Curves: an Introduction with Applications in Scientific Computing, vol. 9 2012 Berlin Springer
[60]
Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. arXiv:1302.0103 (2013)
[61]
Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.: Titan: A high-performance remote-sensing database. In: ICDE, pp 375–384. IEEE (1997)
[62]
Hartmanis J Computers and intractability: a guide to the theory of np-completeness Siam Review 1982 24 1 90
[63]
Guard, U.C.: Vessel Traffic Data. https://marinecadastre.gov/ais/ (2020)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image World Wide Web
World Wide Web  Volume 26, Issue 4
Jul 2023
971 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 09 September 2022
Accepted: 29 August 2022
Revision received: 20 August 2022
Received: 09 May 2022

Author Tags

  1. Multidimensional data
  2. Range query
  3. Dimension order
  4. Two-level linearization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media