Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Survey of Text Classification Algorithms

  • Chapter
  • First Online:
Mining Text Data

Abstract

The problem of classification has been widely studied in the data mining, machine learning, database, and information retrieval communities with applications in a number of diverse domains, such as target marketing, medical diagnosis, news group filtering, and document organization. In this paper we will provide a survey of a wide variety of text classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. C. C. Aggarwal, S. C. Gates, P. S. Yu. On Using Partial Supervision for Text Categorization, IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255, 2004.

    Article  Google Scholar 

  2. C. C. Aggarwal, N. Li. On Node Classification in Dynamic Contentbased Networks, SDM Conference, 2011.

    Google Scholar 

  3. I. Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras, C. Spyropoulos. An Evaluation of Naive Bayesian Anti-Spam Filtering. Workshop on Machine Learning in the New Information Age, in conjunction with ECML Conference, 2000. http://arxiv.org/PS_cache/cs/pdf/0006/0006013v1.pdf

    Google Scholar 

  4. R. Angelova, G. Weikum. Graph-based text classification: learn from your neighbors. ACM SIGIR Conference, 2006.

    Google Scholar 

  5. C. Apte, F. Damerau, S. Weiss. Automated Learning of Decision Rules for Text Categorization, ACM Transactions on Information Systems, 12(3), pp. 233–251, 1994.

    Article  Google Scholar 

  6. M. Aizerman, E. Braverman, L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, 25: pp. 821–837, 1964.

    MathSciNet  Google Scholar 

  7. L. Baker, A. McCallum. Distributional Clustering ofWords for Text Classification, ACM SIGIR Conference, 1998.

    Google Scholar 

  8. R. Bekkerman, R. El-Yaniv, Y. Winter, N. Tishby. On Feature Distributional Clustering for Text Categorization. ACM SIGIR Conference, 2001.

    Google Scholar 

  9. S. Basu, A. Banerjee, R. J. Mooney. Semi-supervised Clustering by Seeding. ICML Conference, 2002.

    Google Scholar 

  10. P. Bennett, S. Dumais, E. Horvitz. Probabilistic Combination of Text Classifiers using Reliability Indicators: Models and Results. ACM SIGIR Conference, 2002.

    Google Scholar 

  11. P. Bennett, N. Nguyen. Refined experts: improving classification in large taxonomies. ACM SIGIR Conference, 2009.

    Google Scholar 

  12. S. Bhagat, G. Cormode, S. Muthukrishnan. Node Classification in Social Networks, Book Chapter in Social Network Data Analytics, Ed. Charu Aggarwal, Springer, 2011.

    Google Scholar 

  13. A. Blum, T. Mitchell. Combining labeled and unlabeled data with co-training. COLT, 1998.

    Google Scholar 

  14. D. Boley, M. Gini, R. Gross, E.-H. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, J. Moore. Partitioning-based clustering for web document categorization. Decision Support Systems, Vol. 27, pp. 329–341, 1999.

    Article  Google Scholar 

  15. L. Brieman, J. Friedman, R. Olshen, C. Stone. Classification and Regression Trees, Wadsworth Advanced Books and Software, CA, 1984.

    Google Scholar 

  16. L. Breiman. Bagging Predictors. Machine Learning, 24(2), pp. 123– 140, 1996.

    MathSciNet  MATH  Google Scholar 

  17. L. Cai, T. Hofmann. Text categorization by boosting automatically extracted concepts. ACM SIGIR Conference, 2003.

    Google Scholar 

  18. S. Chakrabarti, S. Roy, M. Soundalgekar. Fast and Accurate Text Classification via Multiple Linear Discriminant Projections, VLDB Journal, 12(2), pp. 172–185, 2003.

    Google Scholar 

  19. S. Chakrabarti, B. Dom. R. Agrawal, P. Raghavan. Using taxonomy, discriminants and signatures for navigating in text databases, VLDB Conference, 1997.

    Google Scholar 

  20. S. Chakrabarti, B. Dom, P. Indyk. Enhanced hypertext categorization using hyperlinks. ACM SIGMOD Conference, 1998.

    Google Scholar 

  21. S. Chakraborti, R. Mukras, R. Lothian, N. Wiratunga, S. Watt, D. Harper. Supervised Latent Semantic Indexing using Adaptive Sprinkling, IJCAI, 2007.

    Google Scholar 

  22. D. Chickering, D. Heckerman, C. Meek. A Bayesian approach for learning Bayesian networks with local structure. Thirteenth Conference on Uncertainty in Artificial Intelligence, 1997.

    Google Scholar 

  23. V. R. de Carvalho, W. Cohen. On the collective classification of email ”speech acts”, ACM SIGIR Conference, 2005.

    Google Scholar 

  24. V. Castelli, T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1), pp. 105–111, 1995.

    Article  Google Scholar 

  25. W. Cohen, H. Hirsh. Joins that generalize: text classification using Whirl. ACM KDD Conference, 1998.

    Google Scholar 

  26. W. Cohen, Y. Singer. Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems, 17(2), pp. 141–173, 1999.

    Article  Google Scholar 

  27. W. Cohen. Learning rules that classify e-mail. AAAI Conference, 1996.

    Google Scholar 

  28. W. Cohen. Learning with set-valued features. AAAI Conference, 1996.

    Google Scholar 

  29. W. Cooper. Some inconsistencies and misnomers in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), pp. 100–111, 1995.

    Article  Google Scholar 

  30. C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, 20: pp. 273–297, 1995.

    MATH  Google Scholar 

  31. T. M. Cover, J. A. Thomas. Elements of information theory. New York: John Wiley and Sons, 1991.

    Google Scholar 

  32. M. Craven, S. Slattery. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning, 43: pp. 97–119, 2001.

    Article  MATH  Google Scholar 

  33. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery. Learning to Extract Symbolic Knowledge from the Worldwide Web. AAAI Conference, 1998.

    Google Scholar 

  34. I. Dagan, Y. Karov, D. Roth. Mistake-driven Learning in Text Categorization, Proceedings of EMNLP, 1997.

    Google Scholar 

  35. A. Dayanik, D. Lewis, D. Madigan, V. Menkov, A. Genkin. Constructing informative prior distributions from domain knowledge in text classification. ACM SIGIR Conference, 2006.

    Google Scholar 

  36. A. P. Dempster, N.M. Laird, D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1): pp. 1–38, 1977.

    MathSciNet  MATH  Google Scholar 

  37. F. Denis, A. Laurent. Text Classification and Co-Training from Positive and Unlabeled Examples, ICML 2003 Workshop: The Continuum from Labeled to Unlabeled Data. http://www.grappa. univ-lille3.fr/ftp/reports/icmlws03.pdf.

    Google Scholar 

  38. S. Deerwester, S. Dumais, T. Landauer, G. Furnas, R. Harshman. Indexing by Latent Semantic Analysis. JASIS, 41(6), pp. 391–407, 1990.

    Google Scholar 

  39. P. Domingos, M. J. Pazzani. On the the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2–3), pp. 103–130, 1997.

    Article  MATH  Google Scholar 

  40. P. Domingos. MetaCost: A General Method for making Classifiers Cost-Sensitive. ACM KDD Conference, 1999.

    Google Scholar 

  41. H. Drucker, D. Wu, V. Vapnik. Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks, 10(5), pp. 1048–1054, 1999.

    Article  Google Scholar 

  42. R. Duda, P. Hart, W. Stork. Pattern Classification, Wiley Interscience, 2000.

    Google Scholar 

  43. S. Dumais, J. Platt, D. Heckerman, M. Sahami. Inductive learning algorithms and representations for text categorization. CIKM Conference, 1998.

    Google Scholar 

  44. S. Dumais, H. Chen. Hierarchical Classification of Web Content. ACM SIGIR Conference, 2000.

    Google Scholar 

  45. C. Elkan. The foundations of cost-sensitive learning, IJCAI Conference, 2001.

    Google Scholar 

  46. R. Fisher. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7, pp. 179–188, 1936.

    Article  Google Scholar 

  47. R. El-Yaniv, O. Souroujon. Iterative Double Clustering for Unsupervised and Semi-supervised Learning. NIPS Conference, 2002.

    Google Scholar 

  48. Y. Freund, R. Schapire. A decision-theoretic generalization of online learning and an application to boosting. In Proc. Second European Conference on Computational Learning Theory, pp. 23–37, 1995.

    Google Scholar 

  49. Y. Freund, R. Schapire, Y. Singer, M. Warmuth. Using and combining predictors that specialize. Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pp. 334–343, 1997.

    Google Scholar 

  50. S. Gao, W. Wu, C.-H. Lee, T.-S. Chua. A maximal figure-of-merit learning approach to text categorization. SIGIR Conference, 2003.

    Google Scholar 

  51. R. Gilad-Bachrach, A. Navot, N. Tishby. Margin based feature selection – theory and algorithms. ICML Conference, 2004.

    Google Scholar 

  52. S. Gopal, Y. Yang. Multilabel classification with meta-level features. ACM SIGIR Conference, 2010.

    Google Scholar 

  53. L. Guthrie, E.Walker. Document Classification by Machine: Theory and Practice. COLING, 1994.

    Google Scholar 

  54. E.-H. Han, G. Karypis, V. Kumar. Text Categorization using Weighted-Adjusted k-nearest neighbor classification, PAKDD Conference, 2001.

    Google Scholar 

  55. E.-H. Han, G. Karypis. Centroid-based Document Classification: Analysis and Experimental Results, PKDD Conference, 2000.

    Google Scholar 

  56. D. Hardin, I. Tsamardinos, C. Aliferis. A theoretical characterization of linear SVM-based feature selection. ICML Conference, 2004.

    Google Scholar 

  57. T. Hofmann. Probabilistic latent semantic indexing. ACM SIGIR Conference, 1999.

    Google Scholar 

  58. P. Howland, M. Jeon, H. Park. Structure Preserving Dimension Reduction for Clustered Text Data based on the Generalized Singular Value Decomposition. SIAM Journal of Matrix Analysis and Applications, 25(1): pp. 165–179, 2003.

    Article  MathSciNet  MATH  Google Scholar 

  59. P. Howland, H. Park. Generalizing discriminant analysis using the generalized singular value decomposition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), pp. 995–1006, 2004.

    Article  Google Scholar 

  60. D. Hull, J. Pedersen, H. Schutze. Method combination for document filtering. ACM SIGIR Conference, 1996.

    Google Scholar 

  61. R. Iyer, D. Lewis, R. Schapire, Y. Singer, A. Singhal. Boosting for document routing. CIKM Conference, 2000.

    Google Scholar 

  62. M. James. Classification Algorithms, Wiley Interscience, 1985.

    Google Scholar 

  63. D. Jensen, J. Neville, B. Gallagher. Why collective inference improves relational classification. ACM KDD Conference, 2004.

    Google Scholar 

  64. T. Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. ICML Conference, 1997.

    Google Scholar 

  65. T. Joachims. Text categorization with support vector machines: learning with many relevant features. ECML Conference, 1998.

    Google Scholar 

  66. T. Joachims. Transductive inference for text classification using support vector machines. ICML Conference, 1999.

    Google Scholar 

  67. T. Joachims. A Statistical Learning Model of Text Classification for Support Vector Machines. ACM SIGIR Conference, 2001.

    Google Scholar 

  68. D. Johnson, F. Oles, T. Zhang, T. Goetz. A Decision Tree-based Symbolic Rule Induction System for Text Categorization, IBM Systems Journal, 41(3), pp. 428–437, 2002.

    Article  Google Scholar 

  69. I. T. Jolliffee. Principal Component Analysis. Springer, 2002.

    Google Scholar 

  70. T. Kalt, W. B. Croft. A new probabilistic model of text classification and retrieval. Technical Report IR-78, University of Massachusetts Center for Intelligent Information Retrieval, 1996. http://ciir. cs.umass.edu/publications/index.shtml

    Google Scholar 

  71. G. Karypis, E.-H. Han. Fast Supervised Dimensionality Reduction with Applications to Document Categorization and Retrieval, ACM CIKM Conference, 2000.

    Google Scholar 

  72. T. Kawatani. Topic difference factor extraction between two document sets and its application to text categorization. ACM SIGIR Conference, 2002.

    Google Scholar 

  73. Y.-H. Kim, S.-Y. Hahn, B.-T. Zhang. Text filtering by boosting naive Bayes classifiers. ACM SIGIR Conference, 2000.

    Google Scholar 

  74. D. Koller, M. Sahami. Hierarchically classifying documents with very few words, ICML Conference, 2007.

    Google Scholar 

  75. S. Lam, D. Lee. Feature reduction for neural network based text categorization. DASFAA Conference, 1999.

    Google Scholar 

  76. W. Lam, C. Y. Ho. Using a generalized instance set for automatic text categorization. ACM SIGIR Conference, 1998.

    Google Scholar 

  77. W. Lam, K.-Y. Lai. A meta-learning approach for text categorization. ACM SIGIR Conference, 2001.

    Google Scholar 

  78. K. Lang. Newsweeder: Learning to filter netnews. ICML Conference, 1995.

    Google Scholar 

  79. L. S. Larkey, W. B. Croft. Combining Classifiers in text categorization. ACM SIGIR Conference, 1996.

    Google Scholar 

  80. D. Lewis, J. Catlett. Heterogeneous uncertainty sampling for supervised learning. ICML Conference, 1994.

    Google Scholar 

  81. D. Lewis, M. Ringuette. A comparison of two learning algorithms for text categorization. SDAIR, 1994.

    Google Scholar 

  82. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. ECML Conference, 1998.

    Google Scholar 

  83. D. Lewis. An Evaluation of Phrasal and Clustered Representations for the Text Categorization Task, ACM SIGIR Conference, 1992.

    Google Scholar 

  84. D. Lewis, W. Gale. A sequential algorithm for training text classifiers, SIGIR Conference, 1994.

    Google Scholar 

  85. D. Lewis, K. Knowles. Threading electronic mail: A preliminary study. Information Processing and Management, 33(2), pp. 209– 217, 1997.

    Article  Google Scholar 

  86. H. Li, K. Yamanishi. Document classification using a finite mixture model. Annual Meeting of the Association for Computational Linguistics, 1997.

    Google Scholar 

  87. Y. Li, A. Jain. Classification of text documents. The Computer Journal, 41(8), pp. 537–546, 1998.

    Article  MATH  Google Scholar 

  88. B. Liu, W. Hsu, Y. Ma. Integrating Classification and Association Rule Mining. ACM KDD Conference, 1998.

    Google Scholar 

  89. B. Liu, L. Zhang. A Survey of Opinion Mining and Sentiment Analysis. Book Chapter in Mining Text Data, Ed. C. Aggarwal, C. Zhai, Springer, 2011.

    Google Scholar 

  90. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2: pp. 285– 318, 1988.

    Google Scholar 

  91. P. Long, R. Servedio. Random Classification Noise defeats all Convex Potential Boosters. ICML Conference, 2008.

    Google Scholar 

  92. S. A. Macskassy, F. Provost. Classification in Networked Data: A Toolkit and a Univariate Case Study, Journal of Machine Learning Research, Vol. 8, pp. 935–983, 2007.

    Google Scholar 

  93. A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu. edu/~mccallum/bow, 1996.

    Google Scholar 

  94. A. McCallum, K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification. AAAI Workshop on Learning for Text Categorization, 1998.

    Google Scholar 

  95. A. McCallum, R. Rosenfeld, T. Mitchell, A. Ng. Improving text classification by shrinkage in a hierarchy of classes. ICML Conference, 1998.

    Google Scholar 

  96. McCallum, Andrew Kachites. ”MALLET: A Machine Learning for Language Toolkit.” http://mallet.cs.umass.edu. 2002.

    Google Scholar 

  97. T. M. Mitchell. Machine Learning. WCB/McGraw-Hill, 1997.

    Google Scholar 

  98. T. M. Mitchell. The role of unlabeled data in supervised learning. Proceedings of the Sixth International Colloquium on Cognitive Science, 1999.

    Google Scholar 

  99. D. Mladenic, J. Brank, M. Grobelnik, N. Milic-Frayling. Feature selection using linear classifier weights: interaction with classification models. ACM SIGIR Conference, 2004.

    Google Scholar 

  100. K. Myers, M. Kearns, S. Singh, M. Walker. A boosting approach to topic spotting on subdialogues. ICML Conference, 2000.

    Google Scholar 

  101. H. T. Ng, W. Goh, K. Low. Feature selection, perceptron learning, and a usability case study for text categorization. ACM SIGIR Conference, 1997.

    Google Scholar 

  102. A. Y. Ng, M. I. Jordan. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. NIPS. pp. 841- 848, 2001.

    Google Scholar 

  103. K. Nigam, A. McCallum, S. Thrun, T. Mitchell. Learning to classify text from labeled and unlabeled documents. AAAI Conference, 1998.

    Google Scholar 

  104. H.-J. Oh, S.-H. Myaeng, M.-H. Lee. A practical hypertext categorization method using links and incrementally available class information. ACM SIGIR Conference, 2000.

    Google Scholar 

  105. X. Qi, B. Davison. Classifiers without borders: incorporating fielded text from neighboring web pages. ACM SIGIR Conference, 2008.

    Google Scholar 

  106. J. R. Quinlan, Induction of Decision Trees, Machine Learning, 1(1), pp 81–106, 1986.

    Google Scholar 

  107. H. Raghavan, J. Allan. An interactive algorithm for asking and incorporating feature feedback into support vector machines. ACM SIGIR Conference, 2007.

    Google Scholar 

  108. S. E. Robertson, K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27: pp. 129–146, 1976.

    Article  Google Scholar 

  109. J. Rocchio. Relevance feedback information retrieval. The Smart Retrieval System- Experiments in Automatic Document Processing, G. Salton, Ed. Prentice Hall, Englewood Cliffs, NJ, pp 313–323, 1971.

    Google Scholar 

  110. M. Ruiz, P. Srinivasan. Hierarchical neural networks for text categorization. ACM SIGIR Conference, 1999.

    Google Scholar 

  111. F. Sebastiani. Machine Learning in Automated Text Categorization, ACM Computing Surveys, 34(1), 2002.

    Google Scholar 

  112. M. Sahami. Learning limited dependence Bayesian classifiers, ACM KDD Conference, 1996.

    Google Scholar 

  113. M. Sahami, S. Dumais, D. Heckerman, E. Horvitz. A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization. Tech. Rep. WS-98-05, AAAI Press. http:// robotics.stanford.edu/users/sahami/papers.html

    Google Scholar 

  114. T. Salles, L. Rocha, G. Pappa, G. Mourao, W. Meira Jr., M. Goncalves. Temporally-aware algorithms for document classification. ACM SIGIR Conference, 2010.

    Google Scholar 

  115. G. Salton. An Introduction to Modern Information Retrieval, Mc Graw Hill, 1983.

    Google Scholar 

  116. R. Schapire, Y. Singer. BOOSTEXTER: A Boosting-based System for Text Categorization, Machine Learning, 39(2/3), pp. 135–168, 2000.

    Article  MATH  Google Scholar 

  117. H. Schutze, D. Hull, J. Pedersen. A comparison of classifiers and document representations for the routing problem. ACM SIGIR Conference, 1995.

    Google Scholar 

  118. R. Shapire, Y. Singer, A. Singhal. Boosting and Rocchio applied to text filtering. ACM SIGIR Conference, 1998.

    Google Scholar 

  119. J. Shavlik, T. Eliassi-Rad. Intelligent agents for web-based tasks: An advice-taking approach. AAAI-98 Workshop on Learning for Text Categorization. Tech. Rep. WS-98-05, AAAI Press, 1998. http://www.cs.wisc.edu/~shavlik/mlrg/publications.html

    Google Scholar 

  120. V. Sindhwani, S. S. Keerthi. Large scale semi-supervised linear SVMs. ACM SIGIR Conference, 2006.

    Google Scholar 

  121. N. Slonim, N. Tishby. The power of word clusters for text classification. European Colloquium on Information Retrieval Research (ECIR), 2001.

    Google Scholar 

  122. N. Slonim, N. Friedman, N. Tishby. Unsupervised document classification using sequential information maximization. ACM SIGIR Conference, 2002.

    Google Scholar 

  123. J.-T. Sun, Z. Chen, H.-J. Zeng, Y. Lu, C.-Y. Shi, W.-Y. Ma. Supervised Latent Semantic Indexing for Document Categorization. ICDM Conference, 2004.

    Google Scholar 

  124. V. Vapnik. Estimations of dependencies based on statistical data, Springer, 1982.

    Google Scholar 

  125. V. Vapnik. The Nature of Statistical Learning Theory, Springer, New York, 1995.

    Google Scholar 

  126. A. Weigand, E. Weiner, J. Pedersen. Exploiting hierarchy in text catagorization. Information Retrieval, 1(3), pp. 193–216, 1999.

    Article  Google Scholar 

  127. S, M. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles, T. Goetz, T. Hampp. Maximizing text-mining performance. IEEE Intelligent Systems, 14(4), pp. 63–69, 1999.

    Google Scholar 

  128. S. M. Weiss, N. Indurkhya. Optimized Rule Induction, IEEE Exp., 8(6), pp. 61–69, 1993.

    Article  Google Scholar 

  129. E. Wiener, J. O. Pedersen, A. S. Weigend. A Neural Network Approach to Topic Spotting. SDAIR, pp. 317–332, 1995.

    Google Scholar 

  130. G.-R. Xue, D. Xing, Q. Yang, Y. Yu. Deep classification in largescale text hierarchies. ACM SIGIR Conference, 2008.

    Google Scholar 

  131. J. Yan, N. Liu, B. Zhang, S. Yan, Z. Chen, Q. Cheng, W. Fan, W.-Y. Ma. OCFS: optimal orthogonal centroid feature selection for text categorization. ACM SIGIR Conference, 2005.

    Google Scholar 

  132. Y. Yang, L. Liu. A re-examination of text categorization methods, ACM SIGIR Conference, 1999.

    Google Scholar 

  133. Y. Yang, J. O. Pederson. A comparative study on feature selection in text categorization, ACM SIGIR Conference, 1995.

    Google Scholar 

  134. Y. Yang, C.G. Chute. An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3), 1994.

    Google Scholar 

  135. Y. Yang. Noise Reduction in a Statistical Approach to Text Categorization, ACM SIGIR Conference, 1995.

    Google Scholar 

  136. Y. Yang. A Study on Thresholding Strategies for Text Categorization. ACM SIGIR Conference, 2001.

    Google Scholar 

  137. Y. Yang, T. Ault, T. Pierce. Combining multiple learning strategies for effective cross-validation. ICML Conference, 2000.

    Google Scholar 

  138. J. Zhang, Y. Yang. Robustness of regularized linear classification methods in text categorization. ACM SIGIR Conference, 2003.

    Google Scholar 

  139. T. Zhang, A. Popescul, B. Dom. Linear prediction models with graph regularization for web-page categorization, ACM KDD Conference, 2006.

    Google Scholar 

  140. S. Zhu, K. Yu, Y. Chi, Y. Gong. Combining content and link for classification using matrix factorization. ACM SIGIR Conference, 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Aggarwal, C.C., Zhai, C. (2012). A Survey of Text Classification Algorithms. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3223-4_6

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-3222-7

  • Online ISBN: 978-1-4614-3223-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics