Abstract
Decision trees can be very useful data mining tools for human experts to diagnose the disease, because the knowledge structure is represented in tree shape. But we may not get satisfactory decision tree, if we do not have enough number of consistent instances in the data sets. Recently two kinds of relatively small data sets of liver disorder from America and India are available, so in order to generate more accurate and useful decision trees for the disease this paper suggests appropriate sampling for the data instances that are in the class of higher error rate. Experiments with the two public domain data sets and a representative decision tree algorithm, C4.5, shows very successful results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ribeiro, R., Marinho, R., Velosa, J., Ramalho, F., Sanches, J.M.: Chronic liver disease staging classification based on ultrasound, clinical and laboratorial data. In: Proceedings of 2011 IEEE International Symposium on Biomedical Imaging from Nano to Macro, pp. 707–710 (2011)
UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets/Liver+Disorders
Zhou, Z., Jiang, Y., Chen, S.: Extracting symbolic rules from trained neural network ensembles. AI Communications 16(1), 3–15 (2003)
Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I.: Decision trees: an overview and their use in medicine. Journal of Medical Systems 26(5), 445–463 (2002)
Lin, Y.C.: Design and Implementation of an Ontology-Based Psychiatric Disorder Detection System. WSEAS Transactions on Information Sciences and Applications 7(1), 56–69 (2010)
Tryfos, P.: Sampling for Applied Research: Text and Cases, Willy (1996)
Ramana, B.V., Babu, M.S.P., Venkateswarlu, N.B.: A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis. International Journal of Computer Science, 506–516 (2012)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc. (1993)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 Algorithms in Data Mining. Knowledge Information System 14, 1–37 (2008)
Chawla, N.V.: C4.5 and Imbalanced data sets : Investigating the effect of sampling emthod, probalistic estimate, and decision tree structure. In: Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC (2003)
Drummond, C., Holte, R.C.: C4.5, Class Imbalance, and Cost Sensitivity: Why Under-sampling beats Over-sampling. In: Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5), 429–449 (2002)
Zhou, Z., Jiang, Y.: NeC4.5: Neural Ensemble Based C4.5. IEEE Transactions on Knowledge and Data Engineering 16 (2004)
Garcke, J., Griebel, M.: Classification with sparse grids using simplicial basis function. Intelligent Data analysis 6 (2002)
Kahramanli, H., Allahverdi, N.: Mining Classification Rules for Liver Disorders. International Journal of Mathematics and Computers in Simulation 3(1), 9–19 (2009)
Ramana, B.V., Babu, M.S.P., Venkateswarlu, N.B.: A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis. International Journal of Database Management Systems 3(2), 101–114 (2011)
Frank, A., Suncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2010), http://archive.ics.uci.edu/ml
Zheng, Z.: Scaling up the Rule Generation of C4.5. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 348–359. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sug, H. (2012). Better Decision Tree Induction for Limited Data Sets of Liver Disease. In: Kim, Th., Kang, JJ., Grosky, W.I., Arslan, T., Pissinou, N. (eds) Computer Applications for Bio-technology, Multimedia, and Ubiquitous City. BSBT MulGraB IUrC 2012 2012 2012. Communications in Computer and Information Science, vol 353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35521-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-35521-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35520-2
Online ISBN: 978-3-642-35521-9
eBook Packages: Computer ScienceComputer Science (R0)