Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1325851.1325952dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
research-article

Efficiently answering top-k typicality queries on large databases

Published: 23 September 2007 Publication History

Abstract

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognition science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed.
Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations. (1) The randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers. (2) The direct local typicality approximation using VP-trees provides an approximation quality guarantee. (3) A VP-tree can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree data structure. An extensive performance study using two real data sets and a series of synthetic data sets clearly show that top-k typicality queries are meaningful and our methods are practical.

References

[1]
I. S. Abramson. On bandwidth variation in kernel estimates-a square root law. Annals of statistics, 10(4):1217--1223, 1982.
[2]
D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC '77.
[3]
C. M. Au Yeung and H. F. Leung. Formalizing typicality of objects and context-sensitivity in ontologies. In AAMAS'06.
[4]
C. M. Au Yeung and H. F. Leung. Ontology with likeliness and typicality of objects in concepts. In ER'06.
[5]
L. W. Barsalou. The instability of graded structure: Implications for the nature of concepts. Concepts and conceptual development, pages 101--140, 1987.
[6]
S. Bespamyatnikh et al. Optimal facility location under-various distance functions. In WADS'99
[7]
P. Bose et al. Fast approximations for sums of distances, clustering and the fermat-weber problem. Computational Geometry: Theory and Applications, 24(3):135--146, April 2003.
[8]
T. Bozkaya and M. Ozsoyoglu. Indexing large metric spaces for similarity search queries. In TODS'99.
[9]
L. Breiman et al. Variable kernel estimates of multivariate densities. Technometrics, 19(2):135--144, May 1977.
[10]
L. R. Brooks. Nonanalytic concept formation and memory for instances. In Cognition and categorization, 1973.Hillsdale.
[11]
N. A. Campbell. Some aspects of allocation and discrimination. Multivariate Statistical Methods in Physical Anthropology, pages 177--192, 1984.
[12]
D. Cantone et al. An efficient approximate algorithm for the 1-median problem in metric spaces. SIAM Journal on Optimization, 16(2):434--451, 2005.
[13]
B. Cohen and G. L. Murphy. Models of concepts. Cognitive Science, 8:27--58, 1984.
[14]
E. Cohen and H. Kaplan. Spatially-Decaying Aggregation Over a Network: Model and Algorithms. In SIGMOD'04.
[15]
E. Cohen and H. Kaplan. Spatially-Decaying Aggregation Over a Network. In JCSS'07.
[16]
G. Das et al. Answering top-k queries using views. In VLDB'06.
[17]
L. Devroye. A course in density estimation. Birkhauser Boston Inc, 1987.
[18]
L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer; 1st edition, 2001.
[19]
D. Dubois et al. Vagueness, typicality, and uncertainty in class hierarchies. International Journal of Intelligent Systems, 6:167--183, 1991.
[20]
R. Fagin et al. Optimal aggregation algorithms for middleware. In PODS'01.
[21]
G. M. Foody et al. Derivation and applications of probabilistic measures of class membership from the maximum likelihood classification. Photogrammetric Engineering and Remote Sensing, 58:1335--1341, 1992.
[22]
D. Gunopoulos et al. Selectivity estimators for multi-dimensional range queries over real attributes. VLDB Journal, 14(2):137--154, April 2005.
[23]
P. Indyk. Sublinear time algorithms for metric space problems. In STOC'99.
[24]
Y. Kanazawa. An optimal variable cell histogram based on the sample spacings. Annals of statistics, 20(1):291--304, 1992.
[25]
Y. Mack and M. Rosenblatt. Multivariate k-nearest neighbor density estimates. Journal of Multivariate Analysis, 9:1--15, 1979.
[26]
K. Mouratidis et al. Continuous monitoring of top-k queries over sliding windows. In SIGMOD'06.
[27]
S. Nepal and M. V. Ramakrishna. Query processing issues in image (multimedia) databases. In ICDE'99.
[28]
R. M. Nosofsky. Similarity, frequency, and category representations. Journal of Experimental Psychology:Learning, Memory, and Cognition, 14(1):54--65, 1988.
[29]
S. K. Reed. Cognition: Theory and Applications. Wadsworth Publishing. 6 edition, 2003.
[30]
L. J. Rips and A. Collins. Categories and resemblance. Journal of experimental psychology, General 122(4):468--486, 1993.
[31]
E. Rosch. Cognitive representations of semantic categories Journal of Experimental Psychology: General, 104:192--233, 1975.
[32]
E. Rosch. On the internal structure of perceptual and semantic categories. Cognitive Development and Acquisition of Language, pages 111--144, 1973.
[33]
D. W. Scott and S. R. Sain. Multi-dimensional density estimation. Handbook of Statistics, 23: Data Mining and Computational Statistics, 2004.
[34]
B. W. Silverman. Density Estimation for Statistics and Data Analysis (Hardcover). Chapman and Hall, 1986.
[35]
V. Tamma and T. Bench-Capon. An ontology model to facilitate knowledge-sharing in multi-agent systems. Knowledge Engineering Review, 17(1):41--60, 2002.
[36]
D. Xin, H. Cheng, X. Yan, and J. Han. Extracting redundancy-aware top-k patterns. In KDD'06.
[37]
D. Xin et al. Answering top-k queries with multi-dimensional selections: The ranking cube approach. In VLDB'06.
[38]
P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA'93.

Cited By

View all
  • (2024)Top-k approximate selection for typicality query results over spatio-textual dataKnowledge and Information Systems10.1007/s10115-023-02013-266:2(1425-1468)Online publication date: 1-Feb-2024
  • (2015)Top-k representative queries with binary constraintsProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791367(1-10)Online publication date: 29-Jun-2015
  • (2014)Answering top-k representative queries on graph databasesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610524(1163-1174)Online publication date: 18-Jun-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '07: Proceedings of the 33rd international conference on Very large data bases
September 2007
1443 pages
ISBN:9781595936493

Sponsors

  • Yahoo! Research
  • Google Inc.
  • SAP
  • Intel: Intel
  • Microsoft Research: Microsoft Research
  • ORACLE: ORACLE
  • Connex.cc
  • HP invent
  • WKO
  • IBM: IBM

Publisher

VLDB Endowment

Publication History

Published: 23 September 2007

Qualifiers

  • Research-article

Funding Sources

Conference

VLDB '07
Sponsor:
  • Intel
  • Microsoft Research
  • ORACLE
  • IBM

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Top-k approximate selection for typicality query results over spatio-textual dataKnowledge and Information Systems10.1007/s10115-023-02013-266:2(1425-1468)Online publication date: 1-Feb-2024
  • (2015)Top-k representative queries with binary constraintsProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791367(1-10)Online publication date: 29-Jun-2015
  • (2014)Answering top-k representative queries on graph databasesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610524(1163-1174)Online publication date: 18-Jun-2014
  • (2012)Answering Typicality Query Based on Automatically Prototype ConstructionProceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 0110.5555/2457524.2457615(362-366)Online publication date: 4-Dec-2012
  • (2011)Efficient top-k retrieval for user preference queriesProceedings of the 2011 ACM Symposium on Applied Computing10.1145/1982185.1982414(1045-1052)Online publication date: 21-Mar-2011
  • (2010)Accessible image search for colorblindnessACM Transactions on Intelligent Systems and Technology10.1145/1858948.18589561:1(1-26)Online publication date: 22-Oct-2010
  • (2010)SplashProceedings of the 13th International Conference on Extending Database Technology10.1145/1739041.1739076(275-286)Online publication date: 22-Mar-2010
  • (2009)Using trees to depict a forestProceedings of the VLDB Endowment10.14778/1687627.16876432:1(133-144)Online publication date: 1-Aug-2009
  • (2009)Accessible image searchProceedings of the 17th ACM international conference on Multimedia10.1145/1631272.1631314(291-300)Online publication date: 23-Oct-2009
  • (2009)Unsupervised image rankingProceedings of the First ACM workshop on Large-scale multimedia retrieval and mining10.1145/1631058.1631074(81-88)Online publication date: 23-Oct-2009
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media