Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1718487.1718493acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Query reformulation using anchor text

Published: 04 February 2010 Publication History

Abstract

Query reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected samples of queries. In this paper, we suggest that anchor text, which is readily available, can be an effective substitute for a query log and study the effectiveness of a range of query reformulation techniques (including log-based stemming, substitution, and expansion) using standard TREC collections. Our results show that log-based query reformulation techniques are indeed effective with standard collections, but expansion is a much safer form of query modification than word substitution. We also show that using anchor text as a simulated query log is as least as effective as a real log for these techniques.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of SIGIR, pages 19--26, 2006.
[2]
F. Ahmad and G. Kondrak. Learning a spelling error model from search query logs. In Proceedings of HLT, pages 955--962, 2005.
[3]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of KDD, pages 407--416, 2000.
[4]
M. Bendersky and W.B. Croft. Discovering key concepts in verbose queries. In Proceedings of SIGIR, pages 491--498, 2008.
[5]
S. Bergsma and Q. Wang. Learning Noun Phrase Query Segmentation. In Proceedings of EMNLP--CoNLL, pages 819--826, 2007.
[6]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998.
[7]
M. Carman, M. Baillie, R. Gwadera and F. Crestani. A statistical comparison of tag and query logs. In Proceedings of SIGIR, pages 123--130, 2009.
[8]
W.B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley, 2009.
[9]
S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP, pages 293--300, 2004.
[10]
I. Dagan, F. Pereira and L. Lee. Similarity-Based Estimation of Word Cooccurrence Probabilities. In Proceedings of ACL, pages 272--278, 1994.
[11]
N. Eiron and K.S. McCurley. Analysis of anchor text for web search. In Proceedings of SIGIR, pages 459--460, 2003.
[12]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of KDD, pages 133--142, 2002.
[13]
R. Jones, B. Rey and O. Madani. Generating Query Substitutions. In Proceedings of WWW, pages 387--396, 2006.
[14]
R. Kraft and J. Zien. Mining anchor text for query refinement. In Proceedings of WWW, pages 666--674, 2004.
[15]
G. Kumaran and V.R. Carvalho. Reducing long queries using query quality predictors. In Proceedings of SIGIR, pages 564--571, 2009.
[16]
V. Lavrenko and W.B. Croft. Relevance based language models. In Proceedings of SIGIR, pages 120--127, 2001.
[17]
M. Lease, J. Allan and W.B. Croft. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries. In Proceedings of ECIR, pages 90--101, 2009.
[18]
Proceedings of the 2009 workshop on Web Search Click Data, Barcelona, Spain. ACM New York, NY, USA, 2009.
[19]
R. Nallapati, W.B. Croft and J. Allan. Relevant query feedback in statistical language modeling. In Proceedings of CIKM, pages 560--563, 2003.
[20]
F. Peng, N. Ahmed, X. Li, and Y. Lu. Context sensitive stemming for web search. In Proceedings of SIGIR, pages 639--646, 2007.
[21]
F. Pereira, N. Tishby and L. Lee. Distributional Clustering of English Words. In Proceedings of ACL, pages 183--190, 1993.
[22]
J.J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System Experiments in Automatic Document Processing, pages 313--323, 1971.
[23]
X. Wang and C. Zhai. Mining term association patterns from search logs for effective query reformulation. In Proceedings of CIKM, pages 479--488, 2008.
[24]
X. Wei, F. Peng, and B. Dumoulin. Analyzing web text association to disambiguate abbreviation in queries. In Proceedings of SIGIR, pages 751--752, 2008.
[25]
J. Wen, J. Nie and H. Zhang. Clustering user queries of a search engine. In Proceedings of WWW, pages 162--168, 2001.
[26]
J. Xu and W.B. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst., 18(1):79--112, 2000.
[27]
R.B. Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proceedings of EDBT Workshop, pages 588--596, 2004.

Cited By

View all
  • (2024)Incorporating Query Recommendation for Improving In-Car Conversational SearchAdvances in Information Retrieval10.1007/978-3-031-56069-9_36(304-312)Online publication date: 23-Mar-2024
  • (2024)Measuring the retrievability of digital library content using analytics dataJournal of the Association for Information Science and Technology10.1002/asi.24886Online publication date: 19-Mar-2024
  • (2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining
February 2010
468 pages
ISBN:9781605588896
DOI:10.1145/1718487
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anchor log
  2. anchor text
  3. query expansion
  4. query log
  5. query reformulation
  6. query substitution

Qualifiers

  • Research-article

Conference

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Incorporating Query Recommendation for Improving In-Car Conversational SearchAdvances in Information Retrieval10.1007/978-3-031-56069-9_36(304-312)Online publication date: 23-Mar-2024
  • (2024)Measuring the retrievability of digital library content using analytics dataJournal of the Association for Information Science and Technology10.1002/asi.24886Online publication date: 19-Mar-2024
  • (2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
  • (2023)The Infinite Index: Information Retrieval on Generative Text-To-Image ModelsProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578327(172-186)Online publication date: 19-Mar-2023
  • (2023)Zero-shot Clarifying Question Generation for Conversational SearchProceedings of the ACM Web Conference 202310.1145/3543507.3583420(3288-3298)Online publication date: 30-Apr-2023
  • (2023)Graph-Attention-Network-Based Cost Estimation Model in Materialized View Environment2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00198(1388-1396)Online publication date: 17-Dec-2023
  • (2022)How to Approach Ambiguous Queries in Conversational Search: A Survey of Techniques, Approaches, Tools, and ChallengesACM Computing Surveys10.1145/353496555:6(1-40)Online publication date: 7-Dec-2022
  • (2022)Towards Explainable Search ResultsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532067(669-680)Online publication date: 6-Jul-2022
  • (2021)Recommending Search Queries in Documents Using Inter N-Gram SimilaritiesProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472252(211-220)Online publication date: 11-Jul-2021
  • (2021)Pre-training for Ad-hoc RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482286(1212-1221)Online publication date: 26-Oct-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media