Abstract
It is known that in decision trees the reliability of lower branches is worse than the upper branches due to data fragmentation problem. As a result, unnecessary tests of attributes may be done, because decision trees may require tests that are not best for some part of the data objects. To supplement the weak point of decision trees of data fragmentation, using reliable short rules with decision tree is suggested, where the short rules come from limited application of association rule finding algorithms. Experiment shows the method can not only generate more reliable decisions but also save test costs by using the short rules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Inc. (1984)
StatSoft, Inc.: Electronic Statistics Textbook. Tulsa, OK, StatSoft (2004), WEB: http://www.statsoft.com/textbook/stathome.html
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: Proc. 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 544–555 (1996)
Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. Data Mining and Knowledge Discovery 4(4), 315–344 (2002)
Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A Framework for Fast Decision Tree Construction of Large Datasets. In: Proc. 1998 Int. Conf. Very Large Data Bases, New York, August 1998, pp. 416–427 (1998)
Catlett, J.: Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, Australia (1991)
SAS: Decision Tree Modeling Course Notes. SAS Publishing (2002)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002)
Almuallim, H., Dietterich, T.G.: Efficient Algorithms for Identifying Relevant Features. In: Proc. of the 9th Canadian Conference on Artificial Intelligence, pp. 38–45 (1992)
Kononenko, I., et al.: Overcoming the Myopia of Inductive Learning Algorithms with RELIEF. Applied Intelligence 7(1), 39–55 (1997)
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer International (1998)
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York, pp. 80–86 (1998)
Liu, B., Hu, M., Hsu, W.: Multi-level Organization and Summarization of the Discovered Rule. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 208–217 (2000)
Wang, K., Zhou, S., He, Y.: Growing Decision Trees on Support-less Association Rules. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 265–269 (2000)
Berzal, F., Cubero, J., Sanchez, D., Serrano, J.M.: ART: A Hybrid Classification Model. Machine Learning 54, 67–92 (2004)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In: Proceedings 2001 Int. Conf. on Data Mining (ICDM 2001), San Jose, CA (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York (1998)
Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Agrawal, R., Mannila, H., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smith, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press/The MIT Press (1996)
Pak, J.S., Chen, M., Yu, P.S.: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions on Knowledge and Data Engineering 9(5), 813–825 (1997)
Toivonen, H.: Discovery of Frequent Patterns in Large Data Collections. Phd thesis, Department of Computer Science, University of Helsinki, Finland (1996)
Savasere, A., Omiecinski, E., Navathe, S.: An Efficient Algorithm for Mining Association Rules in Large Databases. College of Computing, Georgia Institute of Technology, Technical Report No.: GIT–CC–95–04
Cochran, W.G.: Sampling Techniques. Wiley, Chichester (1977)
Aggarawal, C.C., Yu, P.S.: A New Frame Work for Itemset Generation. In: PODS 1998, pp. 18–24 (1998)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An Enabling Techniques. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sug, H. (2006). Using Reliable Short Rules to Avoid Unnecessary Tests in Decision Trees. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_57
Download citation
DOI: https://doi.org/10.1007/11925231_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)