Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1793274.1793317acmotherconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Viewing term proximity from a different perspective

Published: 30 March 2008 Publication History

Abstract

This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions.

References

[1]
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia (1998).
[2]
Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of 29th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval (2006).
[3]
Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: Proceedings of TREC-4 (1995).
[4]
Clarke, C.L.A., Cormack, G.V., Tudhope, E.A.: Relevance ranking for one to three term queries. Information Processing & Management 36(2), 291-311 (2000).
[5]
Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proceedings of 14th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 32-45 (1991).
[6]
Croft, W.B.: Boolean queries and term dependencies in probabilistic retrieval models. JASIS 37(2), 71-77 (1986).
[7]
CSIRO, TREC Web Tracks home page, http://www.ted.cmis.csiro.au/TRECWeb/
[8]
Fagan, J.L.: Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: Proceedings of 10th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 91-101 (1987).
[9]
Fox, C.: A stop list for general text. In: SIGIR Forum, December 1990, vol. 24(4), pp. 19- 35. ACM Press, New York (1990).
[10]
Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proceedings of 27th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 170-177 (2004).
[11]
Harman, D.K.: Overview of the fourth Text Retrieval Conference (TREC-4). In: Proceedings of TREC-4, pp. 1-24.
[12]
Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189-216.
[13]
Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189-216.
[14]
Hawking, D., Thistlewaite, P.: Proximity operators - So near and yet so far. In: Proceedings of TREC-4, pp. 131-143 (1995).
[15]
Hawking, D., Thistlewaite, P.: Relevance weighting using distance between term occurrences. Computer Science Technical Report TR-CS-96-08, Australian National University (August 1996).
[16]
Losee Jr., R.M.: Term dependence: truncating the Bahadur Lazarsfeld expansion. Information Processing and Management 30, 293-303 (1994).
[17]
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of 28th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 472-479 (2005).
[18]
Mishne, G., de Rijke, M.: Boosting web retrieval through query operations. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 502-516. Springer, Heidelberg (2005).
[19]
Nallapati, R., Allan, J.: Capturing term dependencies using a language model on sentence trees. In: Proceedings of the 2002 ACM CIKM Intl. Conf. on Information and Knowledge Management, pp. 383-390 (2002).
[20]
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130-137 (1980).
[21]
Pratt, E.J.: Complete poems. University of Toronto Press (1989).
[22]
Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207-218. Springer, Heidelberg (2003).
[23]
Robertson, S.E., Spark Jones, K.: Relevance weighting for search terms. Journal of the American Society for Information Science 27(3), 129-146 (1976).
[24]
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36(1), 95-108 (2000).
[25]
Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of CIKM 1999, pp. 316-321 (1999).
[26]
Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: The public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226-234 (2001).
[27]
Srikanth, M., Srikanth, R.: Biterm language models for document retrieval. In: Proceedings of SIGIR 2002, pp. 425-426 (2002).
[28]
van Rijsbergen, C.J.: A theoretical basis for the use of cooccurrence data in retrieval. Journal of Documentation 33(2), 106-119 (1977).
[29]
Yu, C.T., Buckley, C., Lam, K., Salton, G.: A generalized term dependence in information retrieval. Technical report (1983).

Cited By

View all
  • (2019)Book search using social information, user profiles and query expansion with Pseudo Relevance FeedbackApplied Intelligence10.1007/s10489-018-1383-z49:6(2178-2200)Online publication date: 1-Jun-2019
  • (2017)Methods for retrieving alternative contract language using a prototypeProceedings of the 16th edition of the International Conference on Articial Intelligence and Law10.1145/3086512.3086530(179-187)Online publication date: 12-Jun-2017
  • (2016)Efficient and Effective Higher Order Proximity ModelingProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970404(21-30)Online publication date: 12-Sep-2016
  • Show More Cited By
  1. Viewing term proximity from a different perspective

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval
    March 2008
    718 pages
    ISBN:3540786457
    • Editors:
    • Craig Macdonald,
    • Iadh Ounis,
    • Vassilis Plachouras,
    • Ian Ruthven,
    • Ryen W. White

    Sponsors

    • Yahoo! Research
    • Google Inc.
    • Microsoft Research: Microsoft Research
    • Matrixware Information Services

    In-Cooperation

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 30 March 2008

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Book search using social information, user profiles and query expansion with Pseudo Relevance FeedbackApplied Intelligence10.1007/s10489-018-1383-z49:6(2178-2200)Online publication date: 1-Jun-2019
    • (2017)Methods for retrieving alternative contract language using a prototypeProceedings of the 16th edition of the International Conference on Articial Intelligence and Law10.1145/3086512.3086530(179-187)Online publication date: 12-Jun-2017
    • (2016)Efficient and Effective Higher Order Proximity ModelingProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970404(21-30)Online publication date: 12-Sep-2016
    • (2016)A learning to rank approach for quality-aware pseudo-relevance feedbackJournal of the Association for Information Science and Technology10.1002/asi.2343067:4(942-959)Online publication date: 1-Apr-2016
    • (2015)Discovering and understanding word level user intent in Web search queriesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2014.07.01030:C(22-38)Online publication date: 1-Jan-2015
    • (2014)Semantic Matching in SearchFoundations and Trends in Information Retrieval10.1561/15000000357:5(343-469)Online publication date: 12-Jun-2014
    • (2014)How Effective are Proximity Scores in Term Dependency Models?Proceedings of the 19th Australasian Document Computing Symposium10.1145/2682862.2682876(89-92)Online publication date: 26-Nov-2014
    • (2014)A Comparison of Retrieval Models using Term DependenciesProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661894(111-120)Online publication date: 3-Nov-2014
    • (2014)A simple term frequency transformation model for effective pseudo relevance feedbackProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609636(323-332)Online publication date: 3-Jul-2014
    • (2014)An enhanced context-sensitive proximity model for probabilistic information retrievalProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609527(1131-1134)Online publication date: 3-Jul-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media