research-article

On m-Impact Regions and Standing Top-k Influence Problems

Authors:

Kyriakos Mouratidis,

Mingji HanAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 1784 - 1796

https://doi.org/10.1145/3448016.3452832

Published: 18 June 2021 Publication History

Abstract

In this paper, we study the m-impact region problem (mIR). In a context where users look for available products with top-k queries, mIR identifies the part of the product space that attracts the most user attention. Specifically, mIR determines the kind of attribute values that lead a (new or existing) product to the top-k result for at least a fraction of the user population. mIR has several applications, ranging from effective marketing to product improvement. Importantly, it also leads to (exact and efficient) solutions for standing top-k impact problems, which were previously solved heuristically only, or whose current solutions face serious scalability limitations. We experiment, among others, on data mined from actual user reviews for real products, and demonstrate the practicality and efficiency of our algorithms, both for mIR and for standing top-k impact problems.

Supplementary Material

MP4 File (3448016.3452832.mp4)

In this paper, we study the m-impact region problem (mIR). In a context where users look for available products with top-k queries, mIR identifies the part of the product space that attracts the most user attention. Specifically, mIR determines the kind of attribute values that lead a (new or existing) product to the top-k result for at least a fraction of the user population. mIR has several applications, ranging from effective marketing to product improvement. Importantly, it also leads to (exact and efficient) solutions for standing top-k impact problems, which were previously solved heuristically only, or whose current solutions face serious scalability limitations. We experiment, among others, on data mined from actual user reviews for real products, and demonstrate the practicality and efficiency of our algorithms, both for mIR and for standing top-k impact problems.

Download
34.66 MB

References

[1]

Hotel dataset. http://www.hotels-base.com.

[2]

House dataset. http://www.ipums.org.

[3]

lpsolver. http://lpsolve.sourceforge.net/5.5/.

[4]

NBA dataset. http://www.basketball-reference.com.

[5]

qhalf. http://www.qhull.org/html/qhalf.htm.

[6]

qhull. http://www.qhull.org.

[7]

TripAdvisor Data Set. http://www.cs.virginia.edu/~hw5x/dataset.html.

[8]

P. K. Agarwal and M. Sharir. Arrangements and their applications. Handbook of computational geometry, pages 49--119, 2000.

[9]

A. Asudeh, A. Nazi, N. Koudas, and G. Das. Maximizing gain over flexible attributes in peer to peer marketplaces. In PAKDD, pages 327--345, 2019.

Digital Library

[10]

A. Asudeh, A. Nazi, N. Zhang, and G. Das. Efficient computation of regret-ratio minimizing set: A compact maxima representative. In SIGMOD Conference, pages 821--834, 2017.

Digital Library

[11]

C. B. Barber, D. P. Dobkin, and H. Huhdanpaa. The quickhull algorithm for convex hulls. ACM Trans. Math. Softw., 22(4):469--483, 1996.

Digital Library

[12]

M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars. Computational geometry: algorithms and applications. Springer-Verlag TELOS, 2008.

[13]

A. Blum, J. C. Jackson, T. Sandholm, and M. Zinkevich. Preference elicitation and query learning. J. Mach. Learn. Res., 5:649--667, 2004.

Digital Library

[14]

S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001.

Digital Library

[15]

Y. Cai, Y. Tang, and N. Mamoulis. Maximizing a record's standing in a relation. IEEE Trans. Knowl. Data Eng., 27(9):2401--2414, 2015.

Digital Library

[16]

T. M. Chan. Output-sensitive results on convex hulls, extreme points, and related problems. Discret. Comput. Geom., 16(4):369--387, 1996.

Digital Library

[17]

Y.-C. Chang, L. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: Indexing for linear optimization queries. In SIGMOD Conference, pages 391--402, 2000.

Digital Library

[18]

B. Chazelle. An optimal convex hull algorithm in any fixed dimension. Discrete & Computational Geometry, 10(4):377--409, 1993.

Digital Library

[19]

M. A. Cheema, Z. Shen, X. Lin, and W. Zhang. A unified framework for efficiently processing ranking related queries. In EDBT, pages 427--438, 2014.

[20]

P. Ciaccia and D. Martinenghi. Reconciling skyline and ranking queries. PVLDB, 10(11):1454--1465, 2017.

Digital Library

[21]

K. Clarkson, K. Mehlhorn, and R. Seidel. Four results on randomized incremental constructions. Computational Geometry, 3(4):185--212, 1993.

Digital Library

[22]

M. Das, G. Das, and V. Hristidis. Leveraging collaborative tagging for web item design. In KDD, pages 538--546, 2011.

Digital Library

[23]

H. Edelsbrunner, R. Seidel, and M. Sharir. On the zone theorem for hyperplane arrangements. SIAM J. Comput., 22(2):418--429, 1993.

Digital Library

[24]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, page 102--113, 2001.

Digital Library

[25]

Y. Gao, Q. Liu, G. Chen, B. Zheng, and L. Zhou. Answering why-not questions on reverse top-k queries. PVLDB, 8(7):738--749, 2015.

Digital Library

[26]

S. Ge, L. H. U, N. Mamoulis, and D. W. Cheung. Efficient all top-k computation - A unified solution for all top-k, reverse top-k and top-m influential queries. IEEE Trans. Knowl. Data Eng., 25(5):1015--1027, 2013.

Digital Library

[27]

S. Ge, L. H. U, N. Mamoulis, and D. W. Cheung. Dominance relationship analysis with budget constraints. Knowl. Inf. Syst., 42(2):409--440, 2015.

Digital Library

[28]

P. Godfrey, R. Shipley, and J. Gryz. Algorithms and analyses for maximal vector computation. VLDB J., 16(1):5--28, 2007.

Digital Library

[29]

V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. In SIGMOD Conference, pages 259--270, 2001.

Digital Library

[30]

I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comp. Surveys, 40(4):11:1--11:58, 2008.

Digital Library

[31]

M. S. Islam and C. Liu. Know your customer: computing k-most promising products for targeted marketing. VLDB J., 25(4):545--570, 2016.

Digital Library

[32]

K. G. Jamieson and R. D. Nowak. Active ranking using pairwise comparisons. In NIPS, pages 2240--2248, 2011.

Digital Library

[33]

T. Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002.

Digital Library

[34]

J. Koh, C. Lin, and A. L. P. Chen. Finding k most favorite products based on reverse top-t queries. VLDB J., 23(4):541--564, 2014.

Digital Library

[35]

F. Korn and S. Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In SIGMOD Conference, pages 201--212, 2000.

Digital Library

[36]

C. Li, B. C. Ooi, A. K. H. Tung, and S. Wang. DADA: a data cube for dominant relationship analysis. In SIGMOD Conference, pages 659--670, 2006.

Digital Library

[37]

C. Lin, J. Koh, and A. L. P. Chen. Determining k-most demanding products with maximum expected number of total customers. IEEE Trans. Knowl. Data Eng., 25(8):1732--1747, 2013.

Digital Library

[38]

H. Lu and C. S. Jensen. Upgrading uncompetitive products economically. In ICDE, pages 977--988, 2012.

Digital Library

[39]

Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In WWW, pages 131--140, 2009.

Digital Library

[40]

N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst., 32(3):19, 2007.

Digital Library

[41]

M. Miah, G. Das, V. Hristidis, and H. Mannila. Standing out in a crowd: Selecting attributes for maximum visibility. In ICDE, pages 356--365, 2008.

Digital Library

[42]

R. D. C. Monteiro and I. Adler. Interior path following primal-dual algorithms. part II: convex quadratic programming. Math. Program., 44(1--3):43--66, 1989.

[43]

K. Mouratidis and B. Tang. Exact processing of uncertain top-k queries in multi-criteria settings. PVLDB, 11(8):866--879, 2018.

Digital Library

[44]

K. Mouratidis, J. Zhang, and H. Pang. Maximum rank query. PVLDB, 8(12):1554--1565, 2015.

Digital Library

[45]

K. Mulmuley. On levels in arrangements and voronoi diagrams. Discrete & Computational Geometry, 6:307--338, 1991.

Digital Library

[46]

D. Nanongkai, A. D. Sarma, A. Lall, R. J. Lipton, and J. J. Xu. Regret-minimizing representative databases. PVLDB, 3(1):1114--1124, 2010.

Digital Library

[47]

V. Padmanabhan, S. Rajiv, and K. Srinivasan. New products, upgrades, and new releases: A rationale for sequential product introduction. Journal of Marketing Research, 34(4):456--472, 1997.

[48]

D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. ACM Trans. Database Syst., 30(1):41--82, 2005.

Digital Library

[49]

Y. Peng, R. C. Wong, and Q. Wan. Finding top-k preferable products. IEEE Trans. Knowl. Data Eng., 24(10):1774--1788, 2012.

Digital Library

[50]

A. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In HLT/EMNLP, pages 339--346, 2005.

Digital Library

[51]

L. Qian, J. Gao, and H. V. Jagadish. Learning user preferences by adaptive pairwise comparison. PVLDB, 8(11):1322--1333, 2015.

Digital Library

[52]

B. Tang, K. Mouratidis, and M. L. Yiu. Determining the impact regions of competing options in preference space. In SIGMOD Conference, pages 805--820, 2017.

Digital Library

[53]

B. Tang, K. Mouratidis, M. L. Yiu, and Z. Chen. Creating top ranking options in the continuous option and preference space. PVLDB, 12(10):1181--1194, 2019.

Digital Library

[54]

Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424--445, 2007.

Digital Library

[55]

Y. Tao, D. Papadias, X. Lian, and X. Xiao. Multidimensional reverse k NN search. VLDB J., 16(3):293--316, 2007.

Digital Library

[56]

Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng., 19(8):1072--1088, 2007.

Digital Library

[57]

A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg. Reverse top-k queries. In ICDE, pages 365--376, 2010.

[58]

A. Vlachou, C. Doulkeridis, K. Nørvåg, and Y. Kotidis. Identifying the most influential data objects with reverse top-k queries. PVLDB, 3(1):364--372, 2010.

Digital Library

[59]

A. Vlachou, C. Doulkeridis, K. Norvag, and Y. Kotidis. Branch-and-bound algorithm for reverse top-k queries. In SIGMOD Conference, pages 481--492, 2013.

Digital Library

[60]

Q. Wan, R. C. Wong, I. F. Ilyas, M. T. Ö zsu, and Y. Peng. Creating competitive products. PVLDB, 2(1):898--909, 2009.

Digital Library

[61]

H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In KDD, pages 783--792, 2010.

Digital Library

[62]

R. C. Wong, M. T. Ö zsu, A. W. Fu, P. S. Yu, L. Liu, and Y. Liu. Maximizing bichromatic reverse nearest neighbor for Lp -norm in two- and three-dimensional spaces. VLDB J., 20(6):893--919, 2011.

Digital Library

[63]

X. Wu, Y. Tao, R. C. Wong, L. Ding, and J. X. Yu. Finding the influence set through skylines. In EDBT, pages 1030--1041, 2009.

Digital Library

[64]

M. Xie, R. C. Wong, and A. Lall. An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J., 29(1):147--175, 2020.

Digital Library

[65]

M. Xie, R. C. Wong, J. Li, C. Long, and A. Lall. Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In SIGMOD Conference, pages 959--974, 2018.

Digital Library

[66]

G. Yang and Y. Cai. Querying improvement strategies. In EDBT, pages 294--305, 2017.

[67]

J. Yang, Y. Zhang, W. Zhang, and X. Lin. Influence based cost optimization on user preference. In ICDE, pages 709--720, 2016.

[68]

J. Yang, Y. Zhang, W. Zhang, and X. Lin. Cost optimization based on influence and user preference. Knowl. Inf. Syst., 61(2):695--732, 2019.

Digital Library

[69]

S. Yang, M. A. Cheema, X. Lin, and Y. Zhang. SLICE: reviving regions-based pruning for reverse k nearest neighbors queries. In ICDE, pages 760--771, 2014.

[70]

M. L. Yiu and N. Mamoulis. Multi-dimensional top-k dominating queries. VLDB J., 18(3):695--718, 2009.

Digital Library

[71]

A. Yu, P. K. Agarwal, and J. Yang. Processing a large number of continuous preference top-k queries. In SIGMOD Conference, pages 397--408, 2012.

Digital Library

[72]

A. Yu, P. K. Agarwal, and J. Yang. Top-k preferences in high dimensions. IEEE Trans. Knowl. Data Eng., 28(2):311--325, 2016.

Digital Library

[73]

J. Zhang, K. Mouratidis, and H. Pang. Direct neighbor search. Inf. Syst., 44:73--92, 2014.

[74]

Z. Zhang, C. Jin, and Q. Kang. Reverse k-ranks query. PVLDB, 7(10):785--796, 2014.

Digital Library

[75]

Z. Zhou, W. Wu, X. Li, M. Lee, and W. Hsu. MaxFirst for MaxBRkNN. In ICDE, pages 828--839, 2011.

Digital Library

Cited By

Xiao XLi J(2023)rkHit: Representative Query with Uncertain PreferenceProceedings of the ACM on Management of Data10.1145/35892711:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589271
Mouratidis KLi KTang B(2023)Quantifying the competitiveness of a dataset in relation to general preferencesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00804-133:1(231-250)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1007/s00778-023-00804-1

Index Terms

On m-Impact Regions and Standing Top-k Influence Problems
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Top-k retrieval in databases
  2. Information systems applications
    1. Decision support systems
      1. Data analytics

Recommendations

Interactive Search for One of the Top-k
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

When a large dataset is given, it is not desirable for a user to read all tuples one-by-one in the whole dataset to find satisfied tuples. The traditional top-k query finds the best k tuples (i.e., the top-k tuples) w.r.t. the user's preference. However,...
Monochromatic and Bichromatic Reverse Top-k Queries

Nowadays, most applications return to the user a limited set of ranked results based on the individual user's preferences, which are commonly expressed through top-k queries. From the perspective of a manufacturer, it is imperative that her products ...
Top-k best probability queries and semantics ranking properties on probabilistic databases

There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

June 2021

2969 pages

ISBN:9781450383431

DOI:10.1145/3448016

General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Education Department of Guangdong
NSFC
Guangdong Provincial Key Laboratory

Conference

SIGMOD/PODS '21

Sponsor:

SIGMOD

SIGMOD/PODS '21: International Conference on Management of Data

June 20 - 25, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)8

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xiao XLi J(2023)rkHit: Representative Query with Uncertain PreferenceProceedings of the ACM on Management of Data10.1145/35892711:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589271
Mouratidis KLi KTang B(2023)Quantifying the competitiveness of a dataset in relation to general preferencesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00804-133:1(231-250)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1007/s00778-023-00804-1

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents