Abstract
Clustering has been extensively studied to deal with different kinds of data. Usually, datasets are represented as a n-dimensional vector of attributes described by numerical or nominal categorical values. Symbolic data is another concept where the objects are more complex such as intervals, multi-categorical or modal. However, new applications might give rise to even more complex data describing for example customer desires, constraints, and preferences. Such data can be expressed more compactly using logic-based representations. In this paper, we introduce a new clustering framework, where complex objects are described by propositional formulas. First, we extend the two well-known k-means and hierarchical agglomerative clustering techniques. Second, we introduce a new divisive algorithm for clustering objects represented explicitly by sets of models. Finally, we propose a propositional satisfiability based encoding of the problem of clustering propositional formulas without the need for an explicit representation of their models. Preliminary experimental results validating our proposed framework are provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Aggarwal, C.C., Reddy, C.K.: Data clustering: algorithms and applications. CRC Press, Boca Raton (2013)
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C.K., Teboulle, M. (eds.) Grouping Multidimensional Data - Recent Advances in Clustering, pp. 25–71. Springer, Heidelberg (2006)
Billard, L., Diday, E., Analysis, S.D.: Conceptual Statistics and Data Mining. Wiley, Hoboken (2012)
Bock, H.H.: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, New York (2000)
Chakraborty, S., Meel, K.S., Vardi, M.Y.: A scalable approximate model counter. In: Schulte, C. (ed.) CP 2013. LNCS, vol. 8124, pp. 200–216. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40627-0_18
de Carvalho, F.D.A., Csernel, M., Lechevallier, Y.: Clustering constrained symbolic data. Pattern Recogn. Lett. 30(11), 1037–1045 (2009)
de Souza, R.M., de Carvalho, F.D.A.: Clustering of interval data based on city-block distances. Pattern Recogn. Lett. 25(3), 353–365 (2004)
Diday, E., Esposito, F.: An introduction to symbolic data analysis and the SODAS software. Intell. Data Anal. 7(6), 583–601 (2003)
Gomes, C.P., Hoffmann, J., Sabharwal, A., Selman, B.: From sampling to model counting. In: IJCAI 1997, pp. 2293–2299 (2007)
Gowda, K.C., Diday, E.: Symbolic clustering algorithms using similarity and dissimilarity measures. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., Burtschy, B. (eds.) New Approaches in Classification and Data Analysis, pp. 414–422. Springer, Heidelberg (1994)
Hotz, L., Felfernig, A., Stumptner, M., Ryabokon, A., Bagley, C., Wolter, K.: Configuration knowledge representation and reasoning. In: Knowledge-Based Configuration, chap. 6, pp. 41–72. Morgan Kaufmann (2014)
Jabbour, S., Lonlac, J., Sais, L., Salhi. Y.: Extending modern SAT solvers for models enumeration. In: IEEE-IRI 2014, pp. 803–810 (2014)
Jaccard, P.: The distribution of the flora of the alpine zone. New Phytol. 11, 37–50 (1912)
Michalski, R.S.: Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J. Policy Anal. Inf. Syst. 4(3), 219–244 (1980)
Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327–352 (1977)
Tversky, A.: Preference, Belief, and Similarity. The MIT Press, Cambridge (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Boudane, A., Jabbour, S., Sais, L., Salhi, Y. (2017). Clustering Complex Data Represented as Propositional Formulas. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-57529-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)