Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1631862.1631865dlproceedingsArticle/Chapter ViewAbstractPublication PagesemseeConference Proceedingsconference-collections
research-article
Free access

Measuring the semantic similarity of texts

Published: 30 June 2005 Publication History

Abstract

This paper presents a knowledge-based method for measuring the semantic-similarity of texts. While there is a large body of previous work focused on finding the semantic similarity of concepts and words, the application of these word-oriented methods to text similarity has not been yet explored. In this paper, we introduce a method that combines word-to-word similarity metrics into a text-to-text metric, and we show that this method outperforms the traditional text similarity metrics based on lexical matching.

References

[1]
A. Budanitsky and G. Hirst. 2001. Semantic distance in word-net: An experimental, application-oriented evaluation of five measures. In Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources, Pittsburgh, June.
[2]
I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL recognising textual entailment challenge. In Proceedings of the PASCAL Workshop.
[3]
W. B. Dolan, C. Quirk, and C. Brockett. 2004. Unsuper-vised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland.
[4]
Y. Freund and R. E. Schapire. 1998. Large margin classification using the perceptron algorithm. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 209--217, New York, NY. ACM Press.
[5]
J. Jiang and D. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.
[6]
T. K. Landauer, P. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Discourse Processes, 25.
[7]
C. Leacock and M. Chodorow. 1998. Combining local context and WordNet sense similiarity for word sense disambiguation. In WordNet, An Electronic Lexical Database. The MIT Press.
[8]
M. E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June.
[9]
C. Y. Lin and E. H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May.
[10]
D. Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI.
[11]
K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, July.
[12]
S. Patwardhan, S. Banerjee, and T. Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February.
[13]
P. Resnik. 1995. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada.
[14]
J. Rocchio, 1971. Relevance feedback in information retrieval. Prentice Hall, Ing. Englewood Cliffs, New Jersey.
[15]
G. Salton and M. E. Lesk, 1971. Computer evaluation of indexing and text processing, pages 143--180. Prentice Hall, Ing. Englewood Cliffs, New Jersey.
[16]
G. Salton, and A. Bukley. 1997a. Term weighting approaches in automatic text retrieval. In Readings in Information Retrieval. Morgan Kaufmann Publishers, San Francisco, CA.
[17]
G. Salton, A. Singhal, M. Mitra, and C. Buckley. 1997b. Automatic text structuring and summarization. Information Processing and Management, 2(32).
[18]
K. Sparck-Jones. 1972. A statistical interpretation of term specificity and its applicatoin in retrieval. Journal of Documentation, 28(1):11--21.
[19]
E. Voorhees. 1993. Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference, Pittsburgh, PA.
[20]
Z. Wu and M. Palmer. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico.
[21]
J. Xu and W. B. Croft. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference, Zurich, Switzerland.

Cited By

View all
  • (2019)Paraphrase plagiarism identification with character-level featuresPattern Analysis & Applications10.1007/s10044-017-0674-z22:2(669-681)Online publication date: 1-May-2019
  • (2017)Context Comparison of Essay-Type Text FilesProceedings of the 1st International Conference on Algorithms, Computing and Systems10.1145/3127942.3127962(93-97)Online publication date: 10-Aug-2017
  • (2017)DSRIMProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121063(19-26)Online publication date: 1-Oct-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
June 2005
69 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 30 June 2005

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)16
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Paraphrase plagiarism identification with character-level featuresPattern Analysis & Applications10.1007/s10044-017-0674-z22:2(669-681)Online publication date: 1-May-2019
  • (2017)Context Comparison of Essay-Type Text FilesProceedings of the 1st International Conference on Algorithms, Computing and Systems10.1145/3127942.3127962(93-97)Online publication date: 10-Aug-2017
  • (2017)DSRIMProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121063(19-26)Online publication date: 1-Oct-2017
  • (2017)NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News ArticlesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2016.263211739:10(2000-2014)Online publication date: 1-Oct-2017
  • (2016)Amplifying scientific paper's abstract by leveraging data-weighted reconstructionInformation Processing and Management: an International Journal10.1016/j.ipm.2015.12.01452:4(698-719)Online publication date: 1-Jul-2016
  • (2016)A novel framework for social web forums' thread ranking based on semantics and post quality featuresThe Journal of Supercomputing10.1007/s11227-016-1839-z72:11(4276-4295)Online publication date: 1-Nov-2016
  • (2015)Automatic Generation and Insertion of Assessment Items in Online Video CoursesCompanion Proceedings of the 20th International Conference on Intelligent User Interfaces10.1145/2732158.2732183(1-4)Online publication date: 29-Mar-2015
  • (2015)Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity modelJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2014.12.00127:3(248-268)Online publication date: 1-Jul-2015
  • (2015)Boosting paraphrase detection through textual similarity metrics with abductive networksApplied Soft Computing10.1016/j.asoc.2014.10.02126:C(444-453)Online publication date: 1-Jan-2015
  • (2015)From senses to textsArtificial Intelligence10.1016/j.artint.2015.07.005228:C(95-128)Online publication date: 1-Nov-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media