Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/646419.693652guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Published: 16 April 2001 Publication History

Abstract

Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, attribute dependency, and multi-modality of categories. Existing classification techniques have limited applicability in the data sets of these natures. In this paper, we present a Weight Adjusted k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique. We also present two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality. We experimentally evaluated WAKNN on 52 document data sets from a variety of domains and compared its performance against several classification algorithms, such as C4.5, RIPPER, Naive-Bayesian, PEBLS and VSM. Experimental results on these data sets confirm that WAKNN consistently outperforms other existing classification algorithms.

References

[1]
D. Boley, M. Gini, R. Gross, E.H. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. Document categorization and query generation on the world wide web using WebACE. AI Review, 13(5-6), 1999.
[2]
W.W. Cohen. Fast effective rule induction. In Proc. of the Twelfth International Conference on Machine Learning, 1995.
[3]
S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10(1):57-78, 1993.
[4]
T. Curran and P. Thompson. Automatic categorization of statute documents. In Proc. of the 8th ASIS SIG/CR Classification Research Workshop, Tucson, Arizona, 1997.
[5]
I.S. Dhillon and D.M. Modha. Visualizing class structure of multi-dimensional data. In Proc. of the 30th Symposium of the Interface: Computing Science and Statistics, pages 488-493, 1998.
[6]
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.
[7]
E.H. Han. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification . PhD thesis, University of Minnesota, October 1999.
[8]
W. Hersh, C. Buckley, T.J. Leone, and D. Hickam. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In SIGIR-94, pages 192-201, 1994.
[9]
A.K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
[10]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning, 1998.
[11]
L. N. Kanal and Vipin Kumar, editors. Search in Artificial Intelligence. Springer-Verlag, New York, NY, 1988.
[12]
I. Kononenko. Estimating attributes: Analysis and extensions of relief. In Proc. of the 1994 European Conference on Machine Learning, 1994.
[13]
D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994.
[14]
D. D. Lewis. Reuters-21578 text categorization test collection distribution 1.0. http://www.research.att.com/lewis, 1999.
[15]
D.G. Lowe. Similarity metric learning for a variable-kernel classifier. Neural Computation , pages 72-85, January 1995.
[16]
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
[17]
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, 1980.
[18]
J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
[19]
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[20]
G.W. Snedecor and W.G. Cochran. Statistical Methods. Iowa State University Press, 1989.
[21]
TREC. Text REtrieval conference.
[22]
D. Wettschereck, D.W. Aha, and T. Mohri. A review and empirical evaluation of feature-weighting methods for a class of lazy learning algorithms. AI Review, 11, 1997.
[23]
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In SIGIR-94, 1994.
[24]
Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR- 99, 1999.

Cited By

View all
  • (2019)Neural Network-based Detection of Self-Admitted Technical DebtACM Transactions on Software Engineering and Methodology10.1145/332491628:3(1-45)Online publication date: 29-Jul-2019
  • (2019)Text Classification Based on Keywords with Different ThresholdsProceedings of the 2019 4th International Conference on Intelligent Information Technology10.1145/3321454.3321473(101-106)Online publication date: 20-Feb-2019
  • (2019)Miscommunication Detection and Recovery in Situated Human–Robot DialogueACM Transactions on Interactive Intelligent Systems10.1145/32371899:1(1-40)Online publication date: 17-Feb-2019
  • Show More Cited By
  1. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      PAKDD '01: Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
      April 2001
      592 pages

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 16 April 2001

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Neural Network-based Detection of Self-Admitted Technical DebtACM Transactions on Software Engineering and Methodology10.1145/332491628:3(1-45)Online publication date: 29-Jul-2019
      • (2019)Text Classification Based on Keywords with Different ThresholdsProceedings of the 2019 4th International Conference on Intelligent Information Technology10.1145/3321454.3321473(101-106)Online publication date: 20-Feb-2019
      • (2019)Miscommunication Detection and Recovery in Situated Human–Robot DialogueACM Transactions on Interactive Intelligent Systems10.1145/32371899:1(1-40)Online publication date: 17-Feb-2019
      • (2019)Survey on supervised machine learning techniques for automatic text classificationArtificial Intelligence Review10.1007/s10462-018-09677-152:1(273-292)Online publication date: 1-Jun-2019
      • (2019)A discriminative model selection approach and its application to text classificationNeural Computing and Applications10.1007/s00521-017-3151-031:4(1173-1187)Online publication date: 1-Apr-2019
      • (2018)A framework for product description classification in e-commerceJournal of Web Engineering10.5555/3370048.337004917:1-2(1-27)Online publication date: 1-Mar-2018
      • (2018)CW-kNNProceedings of the 4th International Conference on Communication and Information Processing10.1145/3290420.3290431(7-11)Online publication date: 2-Nov-2018
      • (2018)An enhanced short text categorization model with deep abundant representationWorld Wide Web10.1007/s11280-018-0542-921:6(1705-1719)Online publication date: 1-Nov-2018
      • (2016)Deep feature weighting for naive Bayes and its application to text classificationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.02.00252:C(26-39)Online publication date: 1-Jun-2016
      • (2013)Recognition of word collocation habits using frequency rank ratio and inter-term intimacyExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.01.00340:11(4301-4314)Online publication date: 1-Sep-2013
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media