Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1277741.1277922acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A comparison of sentence retrieval techniques

Published: 23 July 2007 Publication History

Abstract

Identifying redundant information in sentences is useful for several applications such as summarization, document provenance, detecting text reuse and novelty detection. The task of identifying redundant information in sentences is defined as follows: Given a query sentence the task is to retrieve sentences from a given collection that express all or some subset of the information present in the query sentence. Sentence retrieval techniques rank sentences based on some measure of their similarity to a query. The effectiveness of such techniques depends on the similarity measure used to rank sentences. An effective retrieval model should be able to handle low word overlap between query and candidate sentences and go beyond just word overlap. Simple language modeling techniques like query likelihood retrieval have outperformed TF-IDF and word overlap based methods for ranking sentences. In this paper, we compare the performance of sentence retrieval using different language modeling techniques for the problem of identifying redundant information.

References

[1]
P. Brown, V. Della Pietra, S. Della Pietra, and R. Mercer. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263--311, 1993.
[2]
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. Proc. ACM SIGIR, pages 154--161, 2006.
[3]
G. Erkan and D. Radev. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 22:457--479, 2004.
[4]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proc. CIKM conference, pages 84--90, 2005.
[5]
V. Lavrenko and W. Croft. Relevance based language models. Proceedings of the ACM SIGIR conference, pages 120--127, 2001.
[6]
D. Metzler, Y. Bernstein, W. Croft, A. Moffat, and J. Zobel. Similarity measures for tracking information flow. Proc. CIKM conference, pages 517--524, 2005.
[7]
D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proceedings of the ACM SIGIR conference, pages 472--479, 2005.
[8]
V. Murdock. Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts Amherst, 2006.
[9]
I. Soboroff and D. Harman. Overview of the TREC 2003 Novelty Track. The Twelfth Text REtrieval Conference, 2003.

Cited By

View all
  • (2023)Identifying Question Intent Similarity Using Computational Techniques2023 IEEE International Conference on Contemporary Computing and Communications (InC4)10.1109/InC457730.2023.10262888(1-5)Online publication date: 21-Apr-2023
  • (2021)Knowledge-based sentence semantic similarity: algebraical propertiesProgress in Artificial Intelligence10.1007/s13748-021-00248-0Online publication date: 21-Aug-2021
  • (2021)Aspect Fusion as Design Paradigm for Legal Information RetrievalIntelligent Human Systems Integration 202110.1007/978-3-030-68017-6_81(547-553)Online publication date: 26-Jan-2021
  • Show More Cited By

Index Terms

  1. A comparison of sentence retrieval techniques

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2007
    946 pages
    ISBN:9781595935977
    DOI:10.1145/1277741
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 July 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. language modeling
    2. sentece retrieval

    Qualifiers

    • Article

    Conference

    SIGIR07
    Sponsor:
    SIGIR07: The 30th Annual International SIGIR Conference
    July 23 - 27, 2007
    Amsterdam, The Netherlands

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Identifying Question Intent Similarity Using Computational Techniques2023 IEEE International Conference on Contemporary Computing and Communications (InC4)10.1109/InC457730.2023.10262888(1-5)Online publication date: 21-Apr-2023
    • (2021)Knowledge-based sentence semantic similarity: algebraical propertiesProgress in Artificial Intelligence10.1007/s13748-021-00248-0Online publication date: 21-Aug-2021
    • (2021)Aspect Fusion as Design Paradigm for Legal Information RetrievalIntelligent Human Systems Integration 202110.1007/978-3-030-68017-6_81(547-553)Online publication date: 26-Jan-2021
    • (2020)SimiT: A Text Similarity Method Using Lexicon and Dependency RepresentationsNew Generation Computing10.1007/s00354-020-00099-8Online publication date: 17-Jun-2020
    • (2019)A framework for intelligent question answering system using semantic context-specific document clustering and WordnetSādhanā10.1007/s12046-018-1022-844:3Online publication date: 18-Feb-2019
    • (2018)Intelligent sentence retrieval using semantic word based answer generation algorithm with cuckoo search optimizationCluster Computing10.1007/s10586-018-2054-xOnline publication date: 23-Feb-2018
    • (2016)INEX Tweet Contextualization taskInformation Processing and Management: an International Journal10.1016/j.ipm.2016.03.00252:5(801-819)Online publication date: 1-Sep-2016
    • (2016)Efficient Set-Correlation Operator Inside DatabasesJournal of Computer Science and Technology10.1007/s11390-016-1657-z31:4(683-701)Online publication date: 8-Jul-2016
    • (2015)Methods for linking EHR notes to education materialsInformation Retrieval Journal10.1007/s10791-015-9263-119:1-2(174-188)Online publication date: 3-Sep-2015
    • (2014)Comparison of Methods to Assess Similarity between PhrasesAdvanced Information Systems Engineering10.1007/978-3-319-12568-8_32(255-263)Online publication date: 2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media