Abstract
Given a data set, consisting of n-dimensional binary vectors of positive and negative examples, a subset S of the attributes is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we consider several selection criteria for evaluating the “separation power” of supports sets, and formulate combinatorial optimization problems for finding the “best and smallest” support sets with respect to such criteria. We provide efficient heuristics, some with a guaranteed performance rate, for the solution of these problems, analyze the distribution of small support sets in random examples, and present the results of some computational experiments with the proposed algorithms.
This work was partially supported by the Grants in Aid by the Ministry of Education, Science, Sports and Culture of Japan (Grants 09044160 and 10205211). The visit of the first author to Kyoto University (January to March, 1999) was also supported by this grant (Grant 09044160). The research of the first and third authors were supported in part by the Office of Naval Research (Grant N00014-92-J-1375). The first author thanks also the National Science Foundation (Grant DMS 98-06389) and DARPA (Contract N66001-97-C-8537) for partial support.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. Almuallim and T. Dietterich. Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69 (1994) 279–305.
R. Agrawal, T. Imielinski and A. Swami. Mining association rules between sets of items in large databases. In: International Conference on Management of Data (SIGMOD 93), (1993) pp. 207–216.
D. Angluin. Queries and concept learning. Machine Learning 2 (1988) 319–342.
A. Blum, L. Hellerstein, and N. Littlestone. Learning in the presence of finitely or infintely many irrelevant attributes. Journal of Computer and System Sciences 50 (1995) pp. 32–40.
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence 67 (1997) 245–285.
A. Blumer, A. Ehrenfeucht, D. Haussler and M. K. Warmuth. Occam’s razor. Information Processing Letters 24 (1987) 377–380.
E. Boros, P.L. Hammer, T. Ibaraki and A. Kogan. Logical analysis of numerical data Mathematical Programming 79 (1997), 163–190.
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik. An implementation of logical analysis of data. IEEE Trans. on Knowledge and Data Engineering 12 (2000) 292–306.
E. Boros, T. Horiyama, T. Ibaraki, K. Makino and M. Yagiura. Finding Small Sets of Essential Attributes in Binary Data. DIMACS Technical Report DTR 2000-10, Rutgers University, 2000; ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/-TechReports/2000/2000-10.ps.gz
E. Boros, T. Ibaraki and K. Makino. Error-free and best-fit extensions of a partially defined Boolean function. Information and Computation 140 (1998) 254–283.
E. Boros, T. Ibaraki and K. Makino. Logical analysis of binary data with missing bits. Artificial Intelligence 107 (1999) 219–264.
W. Brauer and M Scherf. Feature selection by means of a feature weighting approach. Technical Report FKI-221-97, Institute für Informatik, Technische Universität München, 1997.
N. Bshouty and L. Hellerstein. Attribute-efficient learning in query and mistakebound models. J. of Comput. Syst. Sci. 56 (1998) 310–319.
R. Caruana and D. Freitag. Greedy attribute selection. In: Machine Learning: Proceedings of the Eleventh International Conference, (Rutgers University, New Brunswick, NJ 1994), pp. 28–36.
Y. Crama, P. L. Hammer and T. Ibaraki. Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research 16 (1988) 299–326.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining, (AAAI Press/The MIT Press, 1996.)
M.A. Hall and L.A. Smith. Practical feature subset selection for machine learning. In: Proceedings of the 21st Australasian Computer Science Conference (Springer Verlag, 1998) pp. 181–191.
G. John, R. Kohavi and K. Pfleger. Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, (Rutgers University, New Brunswick, NJ 1994), pp. 121–129.
K. Kira and L. Rendell. The feature selection problem: Traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, Menlo Park, (AAAI Press/The MIT Press, 1992), pp. 129–134.
N. Littlestone. Learning quickly when irreleveant attributes abound: a new linearthreshold algorithm. Machine Learning 2 (1988) 285–318.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, (1996) pp. 189–194.
H. Mannila, H. Toivonen and A.I. Verkamo. Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge Discovery in Database (U.M. Fayyad and R. Uthurusamy, eds.) (1994) pp. 181–192.
Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. SIAM Studies in Applied Mathematics, 1994.
J. R. Quinlan. Induction of decision trees. Machine Learning 1 (1986) 81–106.
L. G. Valiant. A theory of the learnable. Communications of the ACM 27 (1984) 1134–1142.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boros, E., Horiyama, T., Ibaraki, T., Makino, K., Yagiura, M. (2000). Finding Essential Attributes in Binary Data. In: Leung, K.S., Chan, LW., Meng, H. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2000. Data Mining, Financial Engineering, and Intelligent Agents. IDEAL 2000. Lecture Notes in Computer Science, vol 1983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44491-2_20
Download citation
DOI: https://doi.org/10.1007/3-540-44491-2_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41450-6
Online ISBN: 978-3-540-44491-6
eBook Packages: Springer Book Archive