research-article

Efficient main-memory top-K selection for multicore architectures

Authors:

Vasileios Zois,

Vassilis J. Tsotras,

Walid A. NajjarAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 13, Issue 2

Pages 114 - 127

https://doi.org/10.14778/3364324.3364327

Published: 01 October 2019 Publication History

Abstract

Efficient Top-k query evaluation relies on practices that utilize auxiliary data structures to enable early termination. Such techniques were designed to trade-off complex work in the buffer pool against costly access to disk-resident data. Parallel in-memory Top-k selection with support for early termination presents a novel challenge because computation shifts higher up in the memory hierarchy. In this environment, data scan methods using SIMD instructions and multithreading perform well despite requiring evaluation of the complete dataset. Early termination schemes that favor simplicity require random access to resolve score ambiguity while those optimized for sequential access incur too many object evaluations. In this work, we introduce the concept of rank uncertainty, a measure of work efficiency that enables classifying existing solutions according to their potential for efficient parallel in-memory Top-fc selection. We identify data reordering and layering strategies as those having the highest potential and provide practical guidelines on how to adapt them for parallel in-memory execution (creating the VTA and SLA approaches). In addition, we show that the number of object evaluations can be further decreased by combining data reordering with angle space partitioning (introducing PTA). Our extensive experimental evaluation on varying query parameters using both synthetic and real data, showcase that PTA exhibits between 2 and 4 orders of magnitude better query latency, and throughput when compared to prior work and our optimized algorithmic variants (i.e. VTA, SLA).

References

[1]

P. Ahmed, M. Hasan, A. Kashyap, V. Hristidis, and V. J. Tsotras. Efficient computation of top-k frequent terms over spatio-temporal ranges. In Proceedings of the 2017 International Conference on Management of Data, pages 1227--1241. ACM, 2017.

Digital Library

[2]

R. Akbarinia, E. Pacitti, and P. Valduriez. Best position algorithms for top-k queries. In Proceedings of the 33rd international Conference on Very Large Databases, pages 495--506. VLDB Endowment, 2007.

Digital Library

[3]

H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In Proceedings of the 32nd International Conference on Very Large Databases, pages 475--486. VLDB Endowment, 2006.

[4]

S. Borzsony, D. Kossmann, and K. Stocker. The skyline operator. In Proceedings of 17th International Conference on Data Engineering, pages 421--430. IEEE, 2001.

Digital Library

[5]

X. Cao, G. Cong, and C. S. Jensen. Retrieving top-k prestige-based relevant spatial web objects. Proceedings of the 36th international Conference on Very Large Databases, 3(1--2):373--384, 2010.

Digital Library

[6]

Y.-C. Chang, L. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: indexing for linear optimization queries. In ACM Sigmod Record, volume 29, pages 391--402. ACM, 2000.

[7]

S. Chaudhuri, L. Gravano, and A. Marian. Optimizing top-k selection queries over multimedia repositories. IEEE Transactions on Knowledge and Data Engineering, 16(8):992--1009, 2004.

Digital Library

[8]

L. Chen, G. Cong, C. S. Jensen, and D. Wu. Spatial keyword query processing: an experimental evaluation. In Proceedings of the 39th International Conference on Very Large Databases, pages 217--228. VLDB Endowment, 2013.

Digital Library

[9]

S. Chester, D. Šidlauskas, I. Assent, and K. S. Bøgh. Scalable parallelization of skyline computation for multi-core processors. In Proceedings of 31st International Conference on Data Engineering, pages 1083--1094. IEEE, 2015.

[10]

G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In Proceedings of the 32nd international Conference on Very Large Databases, pages 451--462. VLDB Endowment, 2006.

Digital Library

[11]

I. De Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In Proceedings of 24th International Conference on Data Engineering, pages 656--665, April 2008.

Digital Library

[12]

C. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern cpus. In Proceedings of the 36th International Conference on Research and Development in Information Retrieval, pages 723--732. ACM, 2013.

Digital Library

[13]

S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In Proceedings of the 18th International Conference on World Wide Web, pages 421--430. ACM, 2009.

Digital Library

[14]

S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In Proceedings of the 34th International Conference on Research and Development in Information Retrieval, pages 993--1002. ACM, 2011.

Digital Library

[15]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, 66(4):614--656, 2003.

[16]

M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. Proceedings of the 37th International Conference on Very Large Databases, 4(12):1213--1224, 2011.

Digital Library

[17]

U. Güntzer, W.-T. Balke, and W. Kießling. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Databases, pages 419--428. Morgan Kaufmann Publishers Inc., 2000.

Digital Library

[18]

X. Han, J. Li, and H. Gao. Efficient top-k retrieval on massive data. IEEE Transactions on Knowledge and Data Engineering, 27(10):2687--2699, 2015.

Digital Library

[19]

X. Han, X. Liu, J. Li, and H. Gao. Tkap: Efficiently processing top-k query on massive data by adaptive pruning. Knowledge and Information Systems, 47(2):301--328, 2016.

Digital Library

[20]

J.-S. Heo, J. Cho, and K.-Y. Whang. The hybrid-layer index: A synergic approach to answering top-k queries in arbitrary subspaces. In Proceedings of the 26th International Conference on Data Engineering, pages 445--448, 2010.

[21]

J.-S. Heo, K.-Y. Whang, M.-S. Kim, Y.-R. Kim, and I.-Y. Song. The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique. Information Sciences, 179(19):3286--3308, 2009.

Digital Library

[22]

V. Hristidis, N. Koudas, and Y. Papakonstantinou. Prefer: A system for the efficient execution of multi-parametric ranked queries. In ACM Sigmod Record, volume 30, pages 259--270. ACM, 2001.

[23]

I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. Proceedings of the 30th International Conference on Very Large Databases, 13(3):207--221, 2004.

Digital Library

[24]

I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 40(4):11, 2008.

[25]

M. Jeon, S. Kim, S.-w. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th International Conference on Research and Development in Information Retrieval, pages 253--262. ACM, 2014.

Digital Library

[26]

C. Jonathan, A. Magdy, M. F. Mokbel, and A. Jonathan. Garnet: A holistic system approach for trending queries in microblogs. In Proceedings of the 32nd International Conference on Data Engineering, pages 1251--1262, May 2016.

[27]

J. Lee, H. Cho, S. Lee, and S.-w. Hwang. Toward scalable indexing for top-k queries. IEEE Transactions on Knowledge and Data Engineering, 26(12):3103--3116, 2014.

[28]

C. Li, K. Chen-Chuan Chang, and I. F. Ilyas. Supporting ad-hoc ranking aggregates. In Proceedings of the 2006 International Conference on Management of Data, pages 61--72. ACM, 2006.

Digital Library

[29]

N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems, 32(3):19, 2007.

Digital Library

[30]

M. J. Menne, I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston. An overview of the global historical climatology network-daily database. Journal of Atmospheric and Oceanic Technology, 29(7):897--910, 2012.

[31]

A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International conference on Very Large Databases, volume 1, pages 281--290, 2001.

[32]

H. Pang, X. Ding, and B. Zheng. Efficient processing of exact top-k queries over disk-resident sorted lists. Proceedings of the 36th International Conference on Very Large Databases, 19(3):437--456, 2010.

Digital Library

[33]

A. Shanbhag, H. Pirk, and S. Madden. Efficient top-k query processing on massively parallel hardware. In Proceedings of the 2018 International Conference on Management of Data, pages 1557--1570. ACM, 2018.

Digital Library

[34]

Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Transactions on Knowledge and Data Engineering, 19(8):1072--1088, 2007.

Digital Library

[35]

S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In Proceedings of the 34th International Conference on Research and Development in Information Retrieval, pages 963--972. ACM, 2011.

Digital Library

[36]

S. Tatikonda, F. Junqueira, B. B. Cambazoglu, and V. Plachouras. On efficient posting list intersection with multicore processors. In Proceedings of the 32nd International Conference on Research and Development in Information Retrieval, pages 738--739. ACM, 2009.

Digital Library

[37]

A. Vlachou, C. Doulkeridis, and Y. Kotidis. Angle-based space partitioning for efficient parallel skyline computation. In Proceedings of the 2008 International Conference on Management of Data, pages 227--238. ACM, 2008.

Digital Library

[38]

M. Xie, L. V. Lakshmanan, and P. T. Wood. Efficient top-k query answering using cached views. In Proceedings of the 16th International Conference on Extending Database Technology, pages 489--500. ACM, 2013.

Digital Library

[39]

D. Xin, C. Chen, and J. Han. Towards robust indexing for ranked queries. In Proceedings of the 32nd international conference on Very Large Databases, pages 235--246. VLDB Endowment, 2006.

Digital Library

[40]

J.-M. Yun, Y. He, S. Elnikety, and S. Ren. Optimal aggregation policy for reducing tail latency of web search. In Proceedings of the 38th International Conference on Research and Development in Information Retrieval, pages 63--72. ACM, 2015.

Digital Library

[41]

S. Zhang, C. Sun, and Z. He. Listmerge: Accelerating top-k aggregation queries over large number of lists. In International Conference on Database Systems for Advanced Applications, pages 67--81. Springer, 2016.

Digital Library

[42]

V. Zois. Top-k selection. https://github.com/vzois/TopK.

[43]

L. Zou and L. Chen. Dominant graph: An efficient indexing structure to answer top-k queries. In Proceedings of the 24th International Conference on Data Engineering, pages 536--545. IEEE, 2008.

Digital Library

[44]

L. Zou and L. Chen. Pareto-based dominant graph: An efficient indexing structure to answer top-k queries. IEEE Transactions on Knowledge and Data Engineering, 23(5):727--741, 2011.

Digital Library

Cited By

Yang YZhang GWu YZhao ZFu Y(2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-06031-x
Cachel KRundensteiner EFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Fair&Share: Fast and Fair Multi-Criteria SelectionsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614874(152-162)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614874
Amagata DOnizuka MHara T(2022)Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spacesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00729-131:4(797-821)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s00778-022-00729-1
Show More Cited By

Recommendations

A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Efficient top-(k,l) range query processing for uncertain data based on multicore architectures

Query processing over uncertain data is very important in many applications due to the existence of uncertainty in real-world data. In this paper, we first elaborate a new and important query in the context of an uncertain database, namely uncertain top-...
A durable and energy efficient main memory using phase change memory technology

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 13, Issue 2

October 2019

140 pages

ISSN:2150-8097

Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2019

Published in PVLDB Volume 13, Issue 2

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
85
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang YZhang GWu YZhao ZFu Y(2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-06031-x
Cachel KRundensteiner EFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Fair&Share: Fast and Fair Multi-Criteria SelectionsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614874(152-162)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614874
Amagata DOnizuka MHara T(2022)Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spacesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00729-131:4(797-821)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s00778-022-00729-1
Amagata DOnizuka MHara TLi GLi ZIdreos SSrivastava D(2021)Fast and Exact Outlier Detection in Metric SpacesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452782(36-48)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3452782

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents