Abstract
In a typical machine learning classification task there are two phases: training and prediction. This paper focuses on improving the efficiency of the prediction phase. When the number of classes is low, linear search among the classes is an efficient way to find the most likely class. However, when the number of classes is high, linear search is inefficient. For example, some applications such as geolocation or time-based classification might require millions of subclasses to fit the data. Specifically, this paper describes a branch-and-bound method to search for the most likely class where the training examples can be partitioned into thousands of subclasses. To get some idea of the performance of branch-and-bound classification, we generated a synthetic set of random trees comprising billions of classes and evaluated branch-and-bound classification. Our results show that branch-and-bound classification is effective when the number of classes is large. Specifically, branch-and-bound improves search efficiency logarithmically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Moore, A.: K-Means and Hierchical Clustering (2001), tutorial at http://www.autonlab.org/tutorials/kmeans11.pdf
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1–38 (1977)
Prieditis, A.E.: Machine discovery of effective admissible heuristics. Machine Learning 12(1-3), 117–141 (1993)
Nowozin, S.: Improved information gain estimates for decision tree induction. arXiv preprint arXiv:1206.4620 (2012)
Land, A.H., Doig, A.G.: An automatic method of solving discrete programming problems. Econometrica 28(3), 497–520 (1960)
Nilsson, N.: Principles of Artificial Intelligence. Tioga Publishing Company (1980)
Korf, R.: Depth-first Iterative-Deepening: An Optimal Admissible Tree Search. Artificial Intelligence 27, 97–109 (1985)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann (2006)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Moore, A.: Very Fast EM-based Mixture Model Clustering Using Multiresolution kd-trees. In: Advances in Neural Information Processing Systems. Morgan Kaufmann (1999)
Moore, A., Schneider, J., et al.: Efficient Locally Weighted Polynomial Regression Predictions. In: Fourteenth International Conference on Machine Learning. Morgan Kaufmann (1997)
Lewis, D.D., Yang, Y., Rose, T., Li, F.: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5, 361–397 (2004), http://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf
Gene Ontology dataset, http://www.geneontology.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prieditis, A., Lee, M. (2013). When Classification becomes a Problem: Using Branch-and-Bound to Improve Classification Efficiency. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-39712-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39711-0
Online ISBN: 978-3-642-39712-7
eBook Packages: Computer ScienceComputer Science (R0)