Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient main-memory top-K selection for multicore architectures

Published: 01 October 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Efficient Top-k query evaluation relies on practices that utilize auxiliary data structures to enable early termination. Such techniques were designed to trade-off complex work in the buffer pool against costly access to disk-resident data. Parallel in-memory Top-k selection with support for early termination presents a novel challenge because computation shifts higher up in the memory hierarchy. In this environment, data scan methods using SIMD instructions and multithreading perform well despite requiring evaluation of the complete dataset. Early termination schemes that favor simplicity require random access to resolve score ambiguity while those optimized for sequential access incur too many object evaluations. In this work, we introduce the concept of rank uncertainty, a measure of work efficiency that enables classifying existing solutions according to their potential for efficient parallel in-memory Top-fc selection. We identify data reordering and layering strategies as those having the highest potential and provide practical guidelines on how to adapt them for parallel in-memory execution (creating the VTA and SLA approaches). In addition, we show that the number of object evaluations can be further decreased by combining data reordering with angle space partitioning (introducing PTA). Our extensive experimental evaluation on varying query parameters using both synthetic and real data, showcase that PTA exhibits between 2 and 4 orders of magnitude better query latency, and throughput when compared to prior work and our optimized algorithmic variants (i.e. VTA, SLA).

    References

    [1]
    P. Ahmed, M. Hasan, A. Kashyap, V. Hristidis, and V. J. Tsotras. Efficient computation of top-k frequent terms over spatio-temporal ranges. In Proceedings of the 2017 International Conference on Management of Data, pages 1227--1241. ACM, 2017.
    [2]
    R. Akbarinia, E. Pacitti, and P. Valduriez. Best position algorithms for top-k queries. In Proceedings of the 33rd international Conference on Very Large Databases, pages 495--506. VLDB Endowment, 2007.
    [3]
    H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In Proceedings of the 32nd International Conference on Very Large Databases, pages 475--486. VLDB Endowment, 2006.
    [4]
    S. Borzsony, D. Kossmann, and K. Stocker. The skyline operator. In Proceedings of 17th International Conference on Data Engineering, pages 421--430. IEEE, 2001.
    [5]
    X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-based relevant spatial web objects. Proceedings of the 36th international Conference on Very Large Databases, 3(1--2):373--384, 2010.
    [6]
    Y.-C. Chang, L. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: indexing for linear optimization queries. In ACM Sigmod Record, volume 29, pages 391--402. ACM, 2000.
    [7]
    S. Chaudhuri, L. Gravano, and A. Marian. Optimizing top-k selection queries over multimedia repositories. IEEE Transactions on Knowledge and Data Engineering, 16(8):992--1009, 2004.
    [8]
    L. Chen, G. Cong, C. S. Jensen, and D. Wu. Spatial keyword query processing: an experimental evaluation. In Proceedings of the 39th International Conference on Very Large Databases, pages 217--228. VLDB Endowment, 2013.
    [9]
    S. Chester, D. Šidlauskas, I. Assent, and K. S. Bøgh. Scalable parallelization of skyline computation for multi-core processors. In Proceedings of 31st International Conference on Data Engineering, pages 1083--1094. IEEE, 2015.
    [10]
    G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In Proceedings of the 32nd international Conference on Very Large Databases, pages 451--462. VLDB Endowment, 2006.
    [11]
    I. De Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In Proceedings of 24th International Conference on Data Engineering, pages 656--665, April 2008.
    [12]
    C. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern cpus. In Proceedings of the 36th International Conference on Research and Development in Information Retrieval, pages 723--732. ACM, 2013.
    [13]
    S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In Proceedings of the 18th International Conference on World Wide Web, pages 421--430. ACM, 2009.
    [14]
    S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In Proceedings of the 34th International Conference on Research and Development in Information Retrieval, pages 993--1002. ACM, 2011.
    [15]
    R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, 66(4):614--656, 2003.
    [16]
    M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. Proceedings of the 37th International Conference on Very Large Databases, 4(12):1213--1224, 2011.
    [17]
    U. Güntzer, W.-T. Balke, and W. Kießling. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Databases, pages 419--428. Morgan Kaufmann Publishers Inc., 2000.
    [18]
    X. Han, J. Li, and H. Gao. Efficient top-k retrieval on massive data. IEEE Transactions on Knowledge and Data Engineering, 27(10):2687--2699, 2015.
    [19]
    X. Han, X. Liu, J. Li, and H. Gao. Tkap: Efficiently processing top-k query on massive data by adaptive pruning. Knowledge and Information Systems, 47(2):301--328, 2016.
    [20]
    J.-S. Heo, J. Cho, and K.-Y. Whang. The hybrid-layer index: A synergic approach to answering top-k queries in arbitrary subspaces. In Proceedings of the 26th International Conference on Data Engineering, pages 445--448, 2010.
    [21]
    J.-S. Heo, K.-Y. Whang, M.-S. Kim, Y.-R. Kim, and I.-Y. Song. The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique. Information Sciences, 179(19):3286--3308, 2009.
    [22]
    V. Hristidis, N. Koudas, and Y. Papakonstantinou. Prefer: A system for the efficient execution of multi-parametric ranked queries. In ACM Sigmod Record, volume 30, pages 259--270. ACM, 2001.
    [23]
    I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. Proceedings of the 30th International Conference on Very Large Databases, 13(3):207--221, 2004.
    [24]
    I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 40(4):11, 2008.
    [25]
    M. Jeon, S. Kim, S.-w. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th International Conference on Research and Development in Information Retrieval, pages 253--262. ACM, 2014.
    [26]
    C. Jonathan, A. Magdy, M. F. Mokbel, and A. Jonathan. Garnet: A holistic system approach for trending queries in microblogs. In Proceedings of the 32nd International Conference on Data Engineering, pages 1251--1262, May 2016.
    [27]
    J. Lee, H. Cho, S. Lee, and S.-w. Hwang. Toward scalable indexing for top-k queries. IEEE Transactions on Knowledge and Data Engineering, 26(12):3103--3116, 2014.
    [28]
    C. Li, K. Chen-Chuan Chang, and I. F. Ilyas. Supporting ad-hoc ranking aggregates. In Proceedings of the 2006 International Conference on Management of Data, pages 61--72. ACM, 2006.
    [29]
    N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems, 32(3):19, 2007.
    [30]
    M. J. Menne, I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston. An overview of the global historical climatology network-daily database. Journal of Atmospheric and Oceanic Technology, 29(7):897--910, 2012.
    [31]
    A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International conference on Very Large Databases, volume 1, pages 281--290, 2001.
    [32]
    H. Pang, X. Ding, and B. Zheng. Efficient processing of exact top-k queries over disk-resident sorted lists. Proceedings of the 36th International Conference on Very Large Databases, 19(3):437--456, 2010.
    [33]
    A. Shanbhag, H. Pirk, and S. Madden. Efficient top-k query processing on massively parallel hardware. In Proceedings of the 2018 International Conference on Management of Data, pages 1557--1570. ACM, 2018.
    [34]
    Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Transactions on Knowledge and Data Engineering, 19(8):1072--1088, 2007.
    [35]
    S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In Proceedings of the 34th International Conference on Research and Development in Information Retrieval, pages 963--972. ACM, 2011.
    [36]
    S. Tatikonda, F. Junqueira, B. B. Cambazoglu, and V. Plachouras. On efficient posting list intersection with multicore processors. In Proceedings of the 32nd International Conference on Research and Development in Information Retrieval, pages 738--739. ACM, 2009.
    [37]
    A. Vlachou, C. Doulkeridis, and Y. Kotidis. Angle-based space partitioning for efficient parallel skyline computation. In Proceedings of the 2008 International Conference on Management of Data, pages 227--238. ACM, 2008.
    [38]
    M. Xie, L. V. Lakshmanan, and P. T. Wood. Efficient top-k query answering using cached views. In Proceedings of the 16th International Conference on Extending Database Technology, pages 489--500. ACM, 2013.
    [39]
    D. Xin, C. Chen, and J. Han. Towards robust indexing for ranked queries. In Proceedings of the 32nd international conference on Very Large Databases, pages 235--246. VLDB Endowment, 2006.
    [40]
    J.-M. Yun, Y. He, S. Elnikety, and S. Ren. Optimal aggregation policy for reducing tail latency of web search. In Proceedings of the 38th International Conference on Research and Development in Information Retrieval, pages 63--72. ACM, 2015.
    [41]
    S. Zhang, C. Sun, and Z. He. Listmerge: Accelerating top-k aggregation queries over large number of lists. In International Conference on Database Systems for Advanced Applications, pages 67--81. Springer, 2016.
    [42]
    V. Zois. Top-k selection. https://github.com/vzois/TopK.
    [43]
    L. Zou and L. Chen. Dominant graph: An efficient indexing structure to answer top-k queries. In Proceedings of the 24th International Conference on Data Engineering, pages 536--545. IEEE, 2008.
    [44]
    L. Zou and L. Chen. Pareto-based dominant graph: An efficient indexing structure to answer top-k queries. IEEE Transactions on Knowledge and Data Engineering, 23(5):727--741, 2011.

    Cited By

    View all
    • (2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 1-Jul-2024
    • (2023)Fair&Share: Fast and Fair Multi-Criteria SelectionsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614874(152-162)Online publication date: 21-Oct-2023
    • (2022)Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spacesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00729-131:4(797-821)Online publication date: 1-Jul-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 2
    October 2019
    140 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 October 2019
    Published in PVLDB Volume 13, Issue 2

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 1-Jul-2024
    • (2023)Fair&Share: Fast and Fair Multi-Criteria SelectionsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614874(152-162)Online publication date: 21-Oct-2023
    • (2022)Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spacesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00729-131:4(797-821)Online publication date: 1-Jul-2022
    • (2021)Fast and Exact Outlier Detection in Metric SpacesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452782(36-48)Online publication date: 9-Jun-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media