Abstract
Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
Similar content being viewed by others
Notes
Lists are sorted by scores decreasingly.
The score is computed by an aggregation of various scoring items provided by the NBA for the corresponding game.
The top-2 games of each combination are shown in Fig. 1b.
Hash indexes can be built to achieve the goal of random accesses.
References
Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA, pp. 633–634 (2002)
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing. In: VLDB, pp. 475–486 (2006)
Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM TODS 27(2), 153–187 (2002)
Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, pp. 369–380 (2002)
Chang, K.C.-C., Hwang, S.-W.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226 (1996)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIDMA 17(1), 134–160 (2003)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)
Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)
Feng, J., Li, G., Wang, J.: Finding top-k answers in keyword search over relational databases using tuple units. TKDE 23(12), 1781–1794 (2011)
Guntzer, J., Balke, W.-T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: ITCC, pp. 622–628 (2001)
Güntzer, U., Balke, W., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)
He, R., Lin, C., McAuley, J.: Fashionista: A fashion-aware graphical system for exploring visually similar items. In: WWW, pp. 199–202 (2016)
He, R., Lin, C., Wang, J., McAuley, J.: Sherlock: sparse hierarchical embeddings for visually-aware one-class collaborative filtering. In: IJCAI, pp. 3740–3746 (2016)
Hua, M., Pei, J., Fu, A.W., Lin, X., Leung, H.: Top-k typicality queries and efficient query answering methods on large databases. VLDB J. 18(3), 809–835 (2009)
Ilyas, I.F., Aref, W.G., Elmagarmid, A. K.: Joining ranked inputs in practice. In: VLDB, pp. 950–961 (2002)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB, pp. 754–765 (2003)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. CSUR 40(4), 11 (2008)
Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: SIGMOD, pp. 61–72 (2006)
Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: ICDE, pp. 673–684 (2011)
Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)
Lu, E.H.-C., Chen, C.-Y., Tseng, V.S.: Personalized trip recommendation with multiple constraints by mining user check-in behaviors. In: SIGSPATIAL GIS, pp. 209–218 (2012)
Lu, J., Senellart, P., Lin, C., Du, X., Wang, S., Chen, X.: Optimal top-k generation of attribute combinations based on ranked lists. In: SIGMOD, pp. 409–420 (2012)
Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. TODS 32(3), 19 (2007)
Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)
Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)
Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. VLDB 1, 281–290 (2001)
Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Qiao, M., Qin, L., Cheng, H., Yu, J.X., Tian, W.: Top-k nearest keyword search on large graphs. PVLDB 6(10), 901–912 (2013)
Ranu, S., Hoang, M.X., Singh, A.K.: Answering top-k representative queries on graph databases. In: SIGMOD, pp. 1163–1174 (2014)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)
Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identities. JACM 27(4), 701–717 (1980)
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: EDBT, pp. 156–167 (2012)
Soliman, M.A., Ilyas, I.F., Chang, K. C.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Probabilistic top-k and ranking-aggregate queries. TODS 33(3), 13 (2008)
Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for topx search. In: VLDB, pp. 625–636 (2005)
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)
Varadarajan, R., Farfán, F., Hristidis, V.: Comparing top-k XML lists. Inf. Syst. 38(6), 820–834 (2013)
Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: ICDE, pp. 990–1001 (2016)
Yang, Z., Fu, A.W., Liu, R.: Diversified top-k subgraph querying in a large graph. In: SIGMOD, pp. 1167–1182 (2016)
Yiu, M.L., Mamoulis, N., Hristidis, V.: Extracting k most important groups from data efficiently. DKE 66(2), 289–310 (2008)
Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Parallel Databases 26(1), 67–126 (2009)
Zhu, R., Zou, Z., Li, J.: Towards efficient top-k reliability search on uncertain graphs. KAIS 50(3), 723–750 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by NSF BIGDATA 1447943, Academy of Finland (310321), NSF China (61472427,61502503), DSAIR center in NTU and Grant MOE2015-T2-2-069 Singapore.
Rights and permissions
About this article
Cite this article
Lin, C., Lu, J., Wei, Z. et al. Optimal algorithms for selecting top-k combinations of attributes: theory and applications. The VLDB Journal 27, 27–52 (2018). https://doi.org/10.1007/s00778-017-0485-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-017-0485-2