Abstract
Uncertain data are usually represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In the context of uncertain data management, there has been a growing interest in clustering uncertain data. In particular, the classic K-means clustering algorithm has been recently adapted to handle uncertain data. However, the centroid-based partitional clustering approach used in the adapted K-means presents two major weaknesses that are related to: (i) an accuracy issue, since cluster centroids are computed as deterministic objects using the expected values of the pdfs of the clustered objects; and, (ii) an efficiency issue, since the expected distance between uncertain objects and cluster centroids is computationally expensive.
In this paper, we address the problem of clustering uncertain data by proposing a K-medoids-based algorithm, called UK-medoids, which is designed to overcome the above issues. In particular, our UK-medoids algorithm employs distance functions properly defined for uncertain objects, and exploits a K-medoids scheme. Experiments have shown that UK-medoids outperforms existing algorithms from an accuracy viewpoint while achieving reasonably good efficiency.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain Data Mining: An Example in Clustering Location Data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 199–204. Springer, Heidelberg (2006)
Imielinski, T., Lipski Jr., W.: Incomplete Information in Relational Databases. Journal of the ACM 31(4), 761–791 (1984)
Abiteboul, S., Kanellakis, P., Grahne, G.: On the Representation and Querying of Sets of Possible Worlds. In: Proc. SIGMOD Conf., pp. 34–48 (1987)
Sadri, F.: Modeling Uncertainty in Databases. In: Proc. ICDE Conf., pp. 122–131 (1991)
Lakshmanan, L.V.S., Leone, N., Ross, R.B., Subrahmanian, V.S.: ProbView: A Flexible Probabilistic Database System. ACM TODS 22(3), 419–469 (1997)
Dalvi, N.N., Suciu, D.: Efficient Query Evaluation on Probabilistic Databases. In: Proc. VLDB Conf., pp. 864–875 (2004)
Green, T., Tannen, V.: Models for Incomplete and Probabilistic Information. IEEE Data Engineering Bulletin 29(1), 17–24 (2006)
Aggarwal, C.C.: On Density Based Transforms for Uncertain Data Mining. In: Proc. ICDE Conf., pp. 866–875 (2007)
Tao, Y., Xiao, X., Cheng, R.: Range Search on Multidimensional Uncertain Data. TODS 32(3), 15–62 (2007)
Galindo, J., Urrutia, A., Piattini, M.: Fuzzy Databases: Modeling, Design, and Implementation. Idea Group Publishing (2006)
Lee, S.K.: An Extended Relational Database Model for Uncertain and Imprecise Information. In: Proc. VLDB Conf., pp. 211–220 (1992)
Lim, E.-P., Srivastava, J., Shekhar, S.: An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration. TKDE 8(5), 707–723 (1996)
Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working Models for Uncertain Data. In: Proc. ICDE Conf., pp. 7–18 (2006)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. SIGMOD Conf., pp. 551–562 (2003)
Kriegel, H.-P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proc. ACM SIGKDD Conf., pp. 672–677 (2005)
Cantoni, V., Lombardi, L., Lombardi, P.: Challenges for Data Mining in Distributed Sensor Networks. In: Proc. ICPR Conf., pp. 1000–1007 (2006)
Faradjian, A., Gehrke, J., Bonnet, P.: GADT: A Probability Space ADT for Representing and Querying the Physical World. In: Proc. ICDE Conf., pp. 201–211 (2002)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-based approximate querying in sensor networks. VLDB Journal 14(4), 417–443 (2005)
Li, Y., Han, J., Yang, J.: Clustering Moving Objects. In: Proc. ACM SIGKDD Conf., pp. 617–622 (2004)
Aggarwal, C.C., Yu, P.S.: A Survey of Uncertain Data Algorithms and Applications. Technical Report RC24394, IBM Research Division, Thomas J. Watson Research Center (October 2007)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Chichester (1990)
Bi, J., Zhang, T.: Support Vector Classification with Input Data Uncertainty. In: Proc. NIPS Conf., pp. 483–493 (2004)
Aggarwal, C.C., Yu, P.S.: Outlier Detection with Uncertain Data. In: Proc. SDM Conf., pp. 483–493 (2008)
Chui, C.K., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)
Kriegel, H.-P., Pfeifle, M.: Hierarchical Density-Based Clustering of Uncertain Data. In: Proc. ICDM Conf., pp. 689–692 (2005)
Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proc. ICDM Conf., pp. 436–445 (2006)
Lee, S.D., Kao, B., Cheng, R.: Reducing UK-means to K-means. In: Proc. ICDM Workshops, pp. 483–488 (2007)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. ACM SIGKDD Conf., pp. 226–231 (1996)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. SIGMOD Conf., pp. 49–60 (1999)
Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Proc. Statistical Data Analysis based on the L 1 Norm Conf., pp. 405–416 (1987)
van Rijsbergen, C.J.: Information Retrieval. Butterworths (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gullo, F., Ponti, G., Tagarelli, A. (2008). Clustering Uncertain Data Via K-Medoids. In: Greco, S., Lukasiewicz, T. (eds) Scalable Uncertainty Management. SUM 2008. Lecture Notes in Computer Science(), vol 5291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87993-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-87993-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87992-3
Online ISBN: 978-3-540-87993-0
eBook Packages: Computer ScienceComputer Science (R0)