Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458361acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Incorporating topical support documents into a small training set in text categorization

Published: 26 October 2008 Publication History

Abstract

This paper explores the incorporation of topical support documents into a training set as a means of compensating for a shortage of positive training data in text categorization. To support topical representation, our method applies a simple transformation to documents, i.e., making new documents from existing positive documents by squaring a conventional term weight. The topical support documents thus created not only are expected to preserve the topic, but even improve the topical representation by emphasizing terms with higher weights. Experiments with support vector machines showed the effectiveness on RCV1 collection with a small number of positive training data. Our topical support representation achieved 52.01% and 8.83% improvements for 33 and 56 categories of RCV1 Topic in micro-averaged F1 with less than 100 and 300 positive documents in learning, respectively. Result analyses based on robustness indicate that topical support documents contribute to a steady and stable improvement.

References

[1]
DeCoste, D. and Schölkopf, B. 2002. Training invariant support vector machines. Machine Learning 46(1).
[2]
Joachims, T. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. of the European Conference on Machine Learning (ECML).
[3]
Lee, K.-S., Kageura, K. 2007. Virtual relevant documents in text categorization with support vector machines, Information Processing & Management, 43(4), Elsvier.
[4]
Lewis, D. D., Yang, Y., Rose, T., and Li, F. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5:361--397.
[5]
Sassano, M. 2003. Virtual Examples for Text Classification with Support Vector Machines, In Proc. of the 2003 conference on Empirical methods in natural language processing, pp. 208--215.
[6]
Shen, D., Pan, R., Sun, J.-T., Pan, J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Transaction on Information Systems (TOIS), 24(3), pp. 320--352.
[7]
Yang, Y. 2001. A study on thresholding strategies for text categorization. In Proc. of 24th ACM SIGIR Conference, pp 137--145.

Index Terms

  1. Incorporating topical support documents into a small training set in text categorization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. support vector machine
      2. text categorization
      3. topical support representation
      4. transformation

      Qualifiers

      • Poster

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 168
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media