Abstract
“Logical analysis of data” (LAD) is a methodology developed since the late eighties, aimed at discovering hidden structural information in data sets. LAD was originally developed for analyzing binary data by using the theory of partially defined Boolean functions. An extension of LAD for the analysis of numerical data sets is achieved through the process of “binarization” consisting in the replacement of each numerical variable by binary “indicator” variables, each showing whether the value of the original variable is above or below a certain level. Binarization was successfully applied to the analysis of a variety of real life data sets. This paper develops the theoretical foundations of the binarization process studying the combinatorial optimization problems related to the minimization of the number of binary variables. To provide an algorithmic framework for the practical solution of such problems, we construct compact linear integer programming formulations of them. We develop polynomial time algorithms for some of these minimization problems, and prove NP-hardness of others.
Similar content being viewed by others
References
D. Angluin, Queries and concept learning,Machine Learning 2 (1988) 319–342.
E. Balas and A. Ho, Set covering algorithms using cutting planes, heuristics and subgradient optimization: A computational study,Mathematical Programming 12 (1980) 37–60.
R. Bellman,Introduction to Matrix Analysis (McGraw-Hill, New York, 1960).
J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions,Information and Computation 123 (1995) 50–63.
E. Boros, V. Gurvich, P.L. Hammer, T. Ibaraki and A. Kogan, Decompositions of partially defined Boolean functions,Discrete Applied Mathematics 62 (1995) 51–75.
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, Logical analysis of numerical data, RUTCOR Research Report RRR 04-97, RUTCOR, Rutgers University, 1997.
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik, An implementation of logical analysis of data, RUTCOR Research Report RRR 22-96, RUTCOR, Rutgers University, 1996.
E. Boros, T. Ibaraki and K. Makino, Error-free and best-fit extensions of partially defined Boolean functions, RUTCOR Research Report RRR 14-95, RUTCOR, Rutgers University, 1995.
E. Boros, T. Ibaraki and K. Makino, Extensions of partially defined Boolean functions with missing data, RUTCOR Research Report RRR 06-96, RUTCOR, Rutgers University, 1996.
P. Clark and T. Niblett, The CN2 induction algorithm,Machine Learning 3 (1989) 261–283.
Y. Crama, P.L. Hammer and T. Ibaraki, Cause-effect relationships and partially defined Boolean functions,Annals of Operations Research 16 (1988) 299–326.
R. Dechter and J. Pearl, Structure identification in relational data,Artificial Intelligence 58 (1992) 237–270.
O. Ekin, P.L. Hammer and A. Kogan, On connected Boolean functions, RUTCOR Research Report RRR 36-96, RUTCOR, Rutgers University, 1996.
M.R. Garey and D.S. Johnson,Computers and Intractability (Freeman, New York, 1979).
P.L. Hammer, Partially defined Boolean functions and cause-effect relationships, in:Proceedings of the International Conference on Multi-Attribute Decision Making Via OR-Based Expert Systems, University of Passau, Passau, Germany, 1986.
A.B. Hammer, P.L. Hammer and I. Muchnik, Logical analysis of Chinese productivity patterns, RUTCOR Report RR 1-96, RUTCOR, Rutgers University, 1996.
S.J. Hong, R-MINI: An iterative approach for generating minimal rules from examples,IEEE Transactions on Knowledge and Data Engineering, to appear.
O.L. Mangasarian, Mathematical programming in machine learning, in: G. Di Pillo and F. Giannessi, eds.,Nonlinear Optimization and Applications (Plenum Publishing, New York, 1996) 283–295.
I. Muchnik and L. Yampolsky, A pilot study of the concurrent validity of logical analysis of data, as applied to two Beck inventories, RUTCOR Technical Report RTR 02-95, RUTCOR, Rutgers University, 1995.
S. Muroga,Threshold Logic and Its Applications (Wiley-Interscience, New York, 1971).
P.M. Murphy and D.W. Aha, UCI repository of machine learning databases: Machine readable data repository, Department of Computer Science, University of California, Irvine, 1994.
J.R. Quinlan, Induction of decision trees,Machine Learning 1 (1986) 81–106.
J.R. Quinlan and R. Rivest, Inferring decision trees using minimum description length principle,Information and Computation 80 (1989) 227–248.
L.G. Valiant, A theory of the learnable,Communications of the ACM 27 (1984) 1134–1142.
Author information
Authors and Affiliations
Additional information
The authors gratefully acknowledge the partial support by the Office of Naval Research (grants N00014-92-J1375 and N00014-92-J4083).
Rights and permissions
About this article
Cite this article
Boros, E., Hammer, P.L., Ibaraki, T. et al. Logical analysis of numerical data. Mathematical Programming 79, 163–190 (1997). https://doi.org/10.1007/BF02614316
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02614316