Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1596374.1596403dlproceedingsArticle/Chapter ViewAbstractPublication PagesconllConference Proceedingsconference-collections
research-article
Free access

Improving text classification by a sense spectrum approach to term expansion

Published: 04 June 2009 Publication History

Abstract

Experimenting with different mathematical objects for text representation is an important step of building text classification models. In order to be efficient, such objects of a formal model, like vectors, have to reasonably reproduce language-related phenomena such as word meaning inherent in index terms. We introduce an algorithm for sense-based semantic ordering of index terms which approximates Cruse's description of a sense spectrum. Following semantic ordering, text classification by support vector machines can benefit from semantic smoothing kernels that regard semantic relations among index terms while computing document similarity. Adding expansion terms to the vector representation can also improve effectiveness. This paper proposes a new kernel which discounts less important expansion terms based on lexical relatedness.

References

[1]
E. Agirre and O. L. De Lacalle. 2003. Clustering Word-Net word senses. In Proceedings of RANLP-03, 4th International Conference on Recent Advances in Natural Language Processing, pages 121--130.
[2]
E. Agirre, E. Alfonseca, and O. L. de Lacalle. 2004. Approximating hierarchy-based similarity for WordNet nominal synsets using topic signatures. In Proceedings of GWC-04, 2nd Global WordNet Conference, pages 15--22.
[3]
R. Basili, M. Cammisa, and A. Moschitti. 2005. Effective use of WordNet semantics via kernel-based learning. In Proceedings of CoNLL-05, 9th Conference on Computational Natural Language Learning, pages 1--8.
[4]
S. Bloehdorn, R. Basili, M. Cammisa, and A. Moschitti. 2006. Semantic kernels for text classification based on topological measures of feature similarity. Proceedings of ICDM-06, 6th IEEE International Conference on Data Mining.
[5]
A. Budanitsky and G. Hirst. 2006. Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1):13--47.
[6]
N. Cristianini, J. Shawe-Taylor, and H. Lodhi. 2002. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2):127--152.
[7]
D. A. Cruse. 1986. Lexical semantics.
[8]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.
[9]
C. Dorrer, P. Londero, M. Anderson, S. Wallentowitz, and IA Walmsley. 2001. Computing with interference: all-optical single-query 50-element database search. In Proceedings of QELS-01, Quantum Electronics and Laser Science Conference, pages 149--150.
[10]
E. Gabrilovich and S. Markovitch. 2005. Feature generation for text categorization using world knowledge. In Proceedings of IJCAI-05, 19th International Joint Conference on Artificial Intelligence, volume 19.
[11]
E. Hoenkamp. 2003. Unitary operators on the document space. Journal of the American Society for Information Science and Technology, 54(4):314--320.
[12]
A. Hotho, S. Staab, and G. Stumme. 2003. WordNet improves text document clustering. In Proceedings of SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval.
[13]
J. Hu, L. Fang, Y. Cao, H. J. Zeng, H. Li, Q. Yang, and Z. Chen. 2008. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of SIGIR-08, 31st ACM International Conference on Research and Development in Information Retrieval, pages 179--186.
[14]
J. J. Jiang and D. W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistics, pages 19--33.
[15]
T. Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, pages 137--142.
[16]
D. Lin. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, volume 98, pages 768--773.
[17]
D. Mavroeidis, G. Tsatsaronis, M. Vazirgiannis, M. Theobald, and G. Weikum. 2005. Word sense disambiguation for exploiting hierarchical thesauri in text classification. Proceedings of PKDD-05, 9th European Conference on the Principles of Data Mining and Knowledge Discovery, pages 181--192.
[18]
S. Mohammad and G. Hirst. 2005. Distributional measures as proxies for semantic relatedness.
[19]
H. Paijmans. 1997. Gravity wells of meaning: detecting information-rich passages in scientific texts. Journal of Documentation, 53(5):520--536.
[20]
M. Palmer, H. T. Dang, and C. Fellbaum. 2006. Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Natural Language Engineering, 13(02):137--163.
[21]
V. V. Raghavan and S. K. M. Wong. 1986. A critical analysis of vector space model for information retrieval. Journal of the American Society for Information Science, 37(5):279--287.
[22]
P. Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of IJCAI-95, 14th International Joint Conference on Artificial Intelligence, volume 1, pages 448--453.
[23]
J. Rodd, G. Gaskell, and W. Marslen-Wilson. 2002. Making sense of semantic ambiguity: Semantic competition in lexical access. Journal of Memory and Language, 46(2):245--266.
[24]
M. D. E. B. Rodriguez and J. M. G. Hidalgo. 1997. Using WordNet to complement training information in text categorisation. In Procedings of RANLP-97, 2nd International Conference on Recent Advances in Natural Language Processing.
[25]
J. Shawe-Taylor and N. Cristianini. 2004. Kernel Methods for Pattern Analysis.
[26]
S. Shi, J. R. Wen, Q. Yu, R. Song, and W. Y. Ma. 2005. Gravitation-based model for information retrieval. In Proceedings of SIGIR-05, 28th ACM International Conference on Research and Development in Information Retrieval, pages 488--495. ACM New York, NY, USA.
[27]
G. Siolas and F. d'Alché Buc. 2000. Support vector machines based on a semantic kernel for text categorization. In Proceedings of IJCNN-00, IEEE International Joint Conference on Neural Networks.
[28]
C. J. van Rijsbergen. 2004. The Geometry of Information Retrieval.
[29]
P. Wittek and S. Darányi. 2007. Representing word semantics for IR by continuous functions. In S. Dominich and F. Kiss, editors, Proceedings of ICTIR-07, 1st International Conference of the Theory of Information Retrieval, pages 149--155.
[30]
P. Wittek, C. L. Tan, and S. Darányi. 2009. An ordering of terms based on semantic relatedness. In H. Bunt, editor, Proceedings of IWCS-09, 8th International Conference on Computational Semantics.
[31]
S. K. M. Wong, W. Ziarko, and P. C. N. Wong. 1985. Generalized vector space model in information retrieval. In Proceedings of SIGIR-85, 8th ACM International Conference on Research and Development in Information Retrieval, pages 18--25.

Cited By

View all
  • (2011)Improving text classification with concept index terms and expansion termsProceedings of the 8th international conference on Advances in neural networks - Volume Part III10.5555/2009463.2009524(485-492)Online publication date: 29-May-2011
  • (2010)Term ranking and categorization for ad-hoc navigationProceedings of the 14th international conference on Artificial intelligence: methodology, systems, and applications10.5555/1885962.1885972(71-80)Online publication date: 8-Sep-2010
  • (2010)Matching evolving Hilbert spaces and language for semantic access to digital librariesProceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries10.5555/1875689.1875734(262-263)Online publication date: 21-Jun-2010

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning
June 2009
243 pages
ISBN:9781932432299

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 04 June 2009

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)12
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Improving text classification with concept index terms and expansion termsProceedings of the 8th international conference on Advances in neural networks - Volume Part III10.5555/2009463.2009524(485-492)Online publication date: 29-May-2011
  • (2010)Term ranking and categorization for ad-hoc navigationProceedings of the 14th international conference on Artificial intelligence: methodology, systems, and applications10.5555/1885962.1885972(71-80)Online publication date: 8-Sep-2010
  • (2010)Matching evolving Hilbert spaces and language for semantic access to digital librariesProceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries10.5555/1875689.1875734(262-263)Online publication date: 21-Jun-2010

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media