Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1871437.1871582acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Clickthrough-based translation models for web search: from word models to phrase models

Published: 26 October 2010 Publication History

Abstract

Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries. We assume that a query is parallel to the titles of documents clicked on for that query. Two translation models are trained and integrated into retrieval models: A word-based translation model that learns the translation probability between single words, and a phrase-based translation model that learns the translation probability between multi-term phrases. Experiments are carried out on a real world data set. The results show that the retrieval systems that use the translation models outperform significantly the systems that do not. The paper also demonstrates that standard statistical machine translation techniques such as word alignment, bilingual phrase extraction, and phrase-based decoding, can be adapted for building a better Web document retrieval system.

References

[1]
Microsoft web n-gram services. http://research.microsoft.com/web-ngram
[2]
Agichtein, E., Brill, E. and Dumais, S. 2006. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26.
[3]
Baeza-Yates, R. and Tiberi, A. 2007. Extracting semantic relations from query logs. In SIGKDD, pp. 76--85.
[4]
Bai, J., Nie, J-Y., Cao, G., and Bouchard, H. 2007. Using query contexts in information retrieval. In SIGIR, pp. 15--22.
[5]
Bai, J., Song, D., Bruza, P., Nie, J-Y., and Cao, G. 2005. Query expansion using term relationships in language models for information retrieval. In CIKM, pp. 688--695.
[6]
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In SIGIR, pp. 192--199.
[7]
Berger, A., and Lafferty, J. 1999. Information retrieval as statistical translation. In SIGIR, pp. 222--229.
[8]
Blei, D. M., Ng, A. Y., and Jordan, M. J. 2003. Latent Di-richlet allocation. In Journal of Machine Learning Research, 3: 993--1022.
[9]
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263--311.
[10]
Buckley, D., Allan, J., and Salton, G. 1995. Automatic retrieval approaches using SMART: TREC-2. Information Processing and Management, 31: 315--326.
[11]
Cao, G., Nie, J-Y., Gao, J., and Robertson, S. 2008. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pp. 243--250.
[12]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391--407.
[13]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 1--38.
[14]
Gao, J., Nie, J-Y., Wu, G., and Cao, G. 2004. Dependence language model for information retrieval. In SIGIR, pp. 170--177.
[15]
Gao, J., Qin, H., Xia, X. and Nie, J-Y. 2005. Linear discriminative models for information retrieval. In SIGIR, pp. 290--297.
[16]
Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J-Y. 2009. Smoothing clickthrough data for web search ranking. In SIGIR, pp. 355--362.
[17]
Hofmann, T. 1999. Probabilistic latent semantic indexing. In SIGIR, pp. 50--57.
[18]
Huang, J., Gao, J., Miao, J., Li, X., Wang, K., and Behr, F. 2010. Exploring web scale language models for search query processing. In Proc. WWW 2010, pp. 451--460.
[19]
Jarvelin, K. and Kekalainen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pp. 41--48.
[20]
Jeon, J., Croft, W. B., and Lee, J. H. 2005. Finding similar questions in large question and answer archives. In CIKM, pp. 84--90.
[21]
Jin, R., Hauptmann, A. G., and Zhai, C. 2002. Title language model for information retrieval. In SIGIR, pp. 42--48.
[22]
Jones, K. S., Walker S., and Robertson, S. 1998. A probabilistic model of information retrieval: development and status. Technical Report TR-446, Cambridge University Computer Laboratory.
[23]
Koehn, P., Och, F., and Marcu, D. 2003. Statistical phrase-based translation. In HLT/NAACL, pp. 127--133.
[24]
Murdock, V., and Croft, W. B. 2005. A statisitcal model for sentence retrieval. In HLT/EMNLP, pp. 684--691.
[25]
Metzler, D., and Croft, W. B. 2005. A Markov random field model for term dependencies. In SIGIR, pp. 472--479.
[26]
Nguyen, P., Gao, J., and Mahajan, M. 2007. MSRLM: a scalable language modeling toolkit. Technical report TR-2007-144, Microsoft Research.
[27]
Och, F. 2002. Statistical machine translation: from single-word models to alignment templates. PhD thesis, RWTH Aachen.
[28]
Och, F., and Ney, H. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4): 417--449.
[29]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes In C. Cambridge Univ. Press.
[30]
Rocchio, J. 1971. Relevance feedback in information retrieval. In The SMART retrieval system: experiments in automatic document processing, pp. 313--323, Prentice-Halll Inc.
[31]
Song, F., and Croft, B. 1999. A general language model for information retrieval. In: CIKM'99, pp. 316--321.
[32]
Sparck Jones, K. 1998. What is the role of NLP in text retrieval? In: Naturnal language information retrieval (Ed. T. Strzalkowski), Dordrecht: Kluwer.
[33]
Svore, K., and Burges, C. 2009. A machine learning approach for improved BM25 retrieval. In CIKM, pp. 1811--1814.
[34]
Wen, J. Nie, J.Y. and Zhang, H. 2002. Query Clustering Using User Logs, ACM TOIS, 20 (1): 59--81.
[35]
Xu, J., and Croft, W. B. 2000. Improving effectiveness of information retrieval with local context analysis. In: ACM TOIS, 18(1): 79--112.
[36]
Xue, X., Jeon, J., and Croft, B. 2008. Retrieval models for question and answer archives. In SIGIR, pp. 475--482.
[37]
Zhai, C., and Lafferty, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pp. 334--342.

Cited By

View all
  • (2025)Structural Analysis of Design DocumentsIndustrial Intelligence: Methods and Applications10.1007/978-3-031-81477-8_4(87-124)Online publication date: 4-Feb-2025
  • (2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
  • (2023)Integrating Representation and Interaction for Context-Aware Document RankingACM Transactions on Information Systems10.1145/352995541:1(1-23)Online publication date: 10-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clickthrough data
  2. language model
  3. linear ranking model
  4. plsa
  5. translation model
  6. web search

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Structural Analysis of Design DocumentsIndustrial Intelligence: Methods and Applications10.1007/978-3-031-81477-8_4(87-124)Online publication date: 4-Feb-2025
  • (2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
  • (2023)Integrating Representation and Interaction for Context-Aware Document RankingACM Transactions on Information Systems10.1145/352995541:1(1-23)Online publication date: 10-Jan-2023
  • (2023)Information Retrieval: Recent Advances and BeyondIEEE Access10.1109/ACCESS.2023.329577611(76581-76604)Online publication date: 2023
  • (2023)Field features: The impact in learning to rank approachesApplied Soft Computing10.1016/j.asoc.2023.110183138(110183)Online publication date: May-2023
  • (2022)Semantic Models for the First-Stage Retrieval: A Comprehensive ReviewACM Transactions on Information Systems10.1145/348625040:4(1-42)Online publication date: 24-Mar-2022
  • (2021)Application of Deep Learning Model Convolution Neural Network for Effective Web Information RetrievalHandbook of Research on Machine Learning Techniques for Pattern Recognition and Information Security10.4018/978-1-7998-3299-7.ch007(100-120)Online publication date: 2021
  • (2021)Contrastive Learning of User Behavior Sequence for Context-Aware Document RankingProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482243(2780-2791)Online publication date: 26-Oct-2021
  • (2020)Contextual Re-Ranking with Behavior Aware TransformersProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401276(1589-1592)Online publication date: 25-Jul-2020
  • (2020)Multi-Task Learning for Entity Recommendation and Document Ranking in Web SearchACM Transactions on Intelligent Systems and Technology10.1145/339650111:5(1-24)Online publication date: 26-Jul-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media