Abstract
We summarize the recent developments of a general theory of information distance and its applications in whole genome phylogeny, document comparison, internet query-answer systems, and many other data mining tasks. We also solve an open problem regarding the universality of the normalized information distance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ané, C., Sanderson, M.J.: Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Systematic Biology 54(1), 146–157 (2005)
Bennett, C.H., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1998) (STOC, 1993)
Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)
Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)
Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)
Chernov, A.V., Muchnik, A., Romashchenko, A.E., Shen, A.K., Vereshchagin, N.K.: Upper semi-lattice of binary strings with the relation x is simple conditional to y. Theoret. Comput. Sci. 271, 69–95 (2002)
Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)
Cilibrasi, R., Vitányi, P.M.B.: Automatic semantics using Google (manuscript, 2005) (2004), http://arxiv.org/abs/cs.CL/0412098
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)
Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)
Emanuel, K., Ravela, S., Vivant, E., Risi, C.: A combined statistical-deterministic approach of hurricane risk assessment. Program in Atmospheres, Oceans, and Climate. MIT (manuscript, 2005)
Hao, Y., Zhang, X., Zhu, X., Li, M.: Conditional normalized information distance (manuscript, 2006)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)
Kirk, S.R., Jenkins, S.: Information theory-based software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)
Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. Europhys. Lett. 70(2), 278–284 (2005)
Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)
Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)
Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)
Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications, 2nd edn., p. 637. Springer, Heidelberg (1997)
Muchnik, A.: Conditional comlexity and codes. Theoretical Computer Science 271(1), 97–109 (2002)
Muchnik, A., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity II. In: Proc. 16th Conf. Comput. Complexity, pp. 256–265 (2001)
Otu, H.H., Sayood, K.: Bioinformatics. A new sequence distance measure for phylogenetic tree construction 19(6), 2122–2130 (2003)
Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul Turkey, December 17-19 (2004)
Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005), http://www.emse.fr/OSWIR05/
Costa Santos, C., Bernardes, J., Vitányi, P.M.B., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)
Shen, A.K., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity. Theoret. Comput. Sci. 271, 125–129 (2002)
Taha, W., Crosby, S., Swadi, K.: A new approach to data mining for software design, Rice Univ. (manuscript, 2006)
Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)
Vereshchagin, N.K., V’yugin, M.V.: Independent minimum length programs to translate between given strings. Theoret. Comput. Sci. 271, 131–143 (2002)
V’yugin, M.V.: Information distance and conditional complexities. Theoret. Comput. Sci. 271, 145–150 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, M. (2006). Information Distance and Its Applications. In: Ibarra, O.H., Yen, HC. (eds) Implementation and Application of Automata. CIAA 2006. Lecture Notes in Computer Science, vol 4094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11812128_1
Download citation
DOI: https://doi.org/10.1007/11812128_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37213-4
Online ISBN: 978-3-540-37214-1
eBook Packages: Computer ScienceComputer Science (R0)