Abstract
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to one or more of the other answers, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R∗-tree: An efficient and robust access method for points and rectangles. In: Proc. ofACM SIGMOD Intl. Conf. on Management of Data (1990)
Gower, J.: A general coefficient of similarity and some of its properties. Biometrics 27 (1971)
Grohe, M.: Parameterized Complexity for Database Theorists. SIGMOD Record 31(4) (December 2002)
Guttman, A.: R-trees:A dynamic index structure for spatial searching. In: Proc. of ACMSIGMOD Intl. Conf. on Management of Data (1984)
Hjaltason, G., Samet, H.: Distance Browsing in Spatial Databases. ACM Trans. on Database Systems 24(2) (1999)
Jain, A., Sarda, P., Haritsa, J.: Providing Diversity in K-Nearest Neighbor Query Results, Tech. Report TR-2003-04, DSL/SERC, Indian Institute of Science (2003)
Kothuri, R., Ravada, S., Abugov, D.: Quadtree and R-tree indexes in Oracle Spatial: A comparison using GIS data. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data (2002)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest Neighbor Queries. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data (1995)
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/covtype
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jain, A., Sarda, P., Haritsa, J.R. (2004). Providing Diversity in K-Nearest Neighbor Query Results. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive