We study a practical extension to the Valiant model of machine learning from examples [v84]: the ... more We study a practical extension to the Valiant model of machine learning from examples [v84]: the presence of errors, possibly maliciously generated by an adversary, in the sample data. Recent papers have made progress in the Valiant model by providing algorithms for ...
We study the computational feasibility of learning boolean expressions from examples. Our goals a... more We study the computational feasibility of learning boolean expressions from examples. Our goals are to prove results and develop general techniques that shed light on the boundary between the classes of ex-pressions that are learnable in polynomial time and those that are ...
Search all the public and authenticated articles in CiteULike. Include unauthenticated results to... more Search all the public and authenticated articles in CiteULike. Include unauthenticated results too (may include "spam") Enter a search phrase. You can also specify a CiteULike article id (123456),. a DOI (doi:10.1234/12345678). or a PubMed Id (pmid:12345678). ...
We study a practical extension to the Valiant model of machine learning from examples [v84]: the ... more We study a practical extension to the Valiant model of machine learning from examples [v84]: the presence of errors, possibly maliciously generated by an adversary, in the sample data. Recent papers have made progress in the Valiant model by providing algorithms for ...
While Kolmogorov complexity is the accepted absolute measure of information content in an individ... more While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations. It turns out that these definitions are equivalent up to an additive logarithmic term. We show that the information distance is a universal cognitive similarity distance. We investigate the maximal correlation of the shortest programs involved, the maximal uncorrelation of programs (a generalization of the Slepian-Wolf theorem of classical information theory), and the density properties of the discrete metric spaces induced by the information distances. A related distance measures the amount of nonreversibility of a computation. Using the physical theory of reversible computation, we give an appropriate (universal, anti-symmetric, and transitive) measure of the thermodynamic work required to transform one object in another object by the most efficient process. Information distance between individual objects is needed in pattern recognition where one wants to express effective notions of "pattern similarity" or "cognitive similarity" between individual objects and in thermodynamics of computation where one wants to analyse the energy dissipation of a computation from a particular input to a particular output.
A new class of metrics appropriate for measuring effective similarity relations between sequences... more A new class of metrics appropriate for measuring effective similarity relations between sequences, say one type of similarity per metric, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it minorizes every metric in the class (that is, it is universal in that it discovers all effective similarities). We demonstrate that it too is a metric and takes values in [0, 1]; hence it may be called the similarity metric. This is a theory foundation for a new general practical tool. We give two distinctive applications in widely divergent areas (the experiments by necessity use just computable approximations to the target notions). First, we computationally compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we give fully automatically computed language tree of 52 different language based on translated versions of the "Universal Declaration of Human Rights
Bioinformatics/computer Applications in The Biosciences, 2001
Motivation: Traditional sequence distances require an alignment and therefore are not directly ap... more Motivation: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. Results: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. Availability: The program to estimate our sequence distance, is available at
A new class of distances appropriate for measuring similarity relations between sequences, say on... more A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance," based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric . This theory forms the foundation for a new practical tool. To evidence generality and robustness, we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.
Institutional Plus Memberships receive print copies of the quarterly issues of The UMAP Journal, ... more Institutional Plus Memberships receive print copies of the quarterly issues of The UMAP Journal, our annual CD collection UMAP Modules, Tools for Teaching, our organizational newsletter Consortium, and online membership that allows members to search our online catalog, download COMAP print materials, and reproduce them for classroom use.
We study a practical extension to the Valiant model of machine learning from examples [v84]: the ... more We study a practical extension to the Valiant model of machine learning from examples [v84]: the presence of errors, possibly maliciously generated by an adversary, in the sample data. Recent papers have made progress in the Valiant model by providing algorithms for ...
We study the computational feasibility of learning boolean expressions from examples. Our goals a... more We study the computational feasibility of learning boolean expressions from examples. Our goals are to prove results and develop general techniques that shed light on the boundary between the classes of ex-pressions that are learnable in polynomial time and those that are ...
Search all the public and authenticated articles in CiteULike. Include unauthenticated results to... more Search all the public and authenticated articles in CiteULike. Include unauthenticated results too (may include "spam") Enter a search phrase. You can also specify a CiteULike article id (123456),. a DOI (doi:10.1234/12345678). or a PubMed Id (pmid:12345678). ...
We study a practical extension to the Valiant model of machine learning from examples [v84]: the ... more We study a practical extension to the Valiant model of machine learning from examples [v84]: the presence of errors, possibly maliciously generated by an adversary, in the sample data. Recent papers have made progress in the Valiant model by providing algorithms for ...
While Kolmogorov complexity is the accepted absolute measure of information content in an individ... more While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations. It turns out that these definitions are equivalent up to an additive logarithmic term. We show that the information distance is a universal cognitive similarity distance. We investigate the maximal correlation of the shortest programs involved, the maximal uncorrelation of programs (a generalization of the Slepian-Wolf theorem of classical information theory), and the density properties of the discrete metric spaces induced by the information distances. A related distance measures the amount of nonreversibility of a computation. Using the physical theory of reversible computation, we give an appropriate (universal, anti-symmetric, and transitive) measure of the thermodynamic work required to transform one object in another object by the most efficient process. Information distance between individual objects is needed in pattern recognition where one wants to express effective notions of "pattern similarity" or "cognitive similarity" between individual objects and in thermodynamics of computation where one wants to analyse the energy dissipation of a computation from a particular input to a particular output.
A new class of metrics appropriate for measuring effective similarity relations between sequences... more A new class of metrics appropriate for measuring effective similarity relations between sequences, say one type of similarity per metric, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it minorizes every metric in the class (that is, it is universal in that it discovers all effective similarities). We demonstrate that it too is a metric and takes values in [0, 1]; hence it may be called the similarity metric. This is a theory foundation for a new general practical tool. We give two distinctive applications in widely divergent areas (the experiments by necessity use just computable approximations to the target notions). First, we computationally compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we give fully automatically computed language tree of 52 different language based on translated versions of the "Universal Declaration of Human Rights
Bioinformatics/computer Applications in The Biosciences, 2001
Motivation: Traditional sequence distances require an alignment and therefore are not directly ap... more Motivation: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. Results: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. Availability: The program to estimate our sequence distance, is available at
A new class of distances appropriate for measuring similarity relations between sequences, say on... more A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance," based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric . This theory forms the foundation for a new practical tool. To evidence generality and robustness, we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.
Institutional Plus Memberships receive print copies of the quarterly issues of The UMAP Journal, ... more Institutional Plus Memberships receive print copies of the quarterly issues of The UMAP Journal, our annual CD collection UMAP Modules, Tools for Teaching, our organizational newsletter Consortium, and online membership that allows members to search our online catalog, download COMAP print materials, and reproduce them for classroom use.
Uploads
Papers by Ming Li