Article

Towards Concept-Based Translation Models Using Search Logs for Query Expansion

Authors:

Jian-Yun NieAuthors Info & Claims

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Article No.: 1, Pages 1 - 10

https://doi.org/10.1145/2396761.2530275

Published: 29 October 2012 Publication History

Abstract

Query logs have been successfully used to improve Web search. One of the directions exploits user clickthrough data to extract related terms to a query to perform query expansion (QE). How-ever, term relations have been created between isolated terms without considering their context, giving rise to the problem of term ambiguity. To solve this problem, we propose several ways to place terms in their contexts. On the one hand, contiguous terms can form a phrase; and on the other hand, terms at proximi-ty can provide less strict but useful contextual constraints mutual-ly. Relations extracted between such more constrained groups of terms are expected to be less noisy than those between single terms. In this paper, the constrained groups of terms are called concepts. We exploit user query logs to build statistical translation models between concepts, which are then used for QE.

We perform experiments on the Web search task using a real world data set. Results show that the concept-based statistical translation model trained on clickthrough data outperforms signif-icantly other state-of-the-art QE systems.

References

[1]

Agichtein, E., Brill, E., and Dumais, S. 2006. Improving web search ranking by incorporating user behavior information. In SIGIR, pp. 19-26.

Digital Library

[2]

Baeze-Yates, R., and Ribeiro-Neto, B. 2011. Modern Information Retrieval. Addison-Wesley.

Digital Library

[3]

Bai, J., Song, D., Bruza, P., Nie, J-Y., and Cao, G. 2005. Query expansion using term relationships in language models for information retrieval. In CIKM, pp. 688-695.

Digital Library

[4]

Bendersky, M., Metzler, D., and Croft, B. 2010. Learning concept importance using a weighted dependence model. In WSDM, pp. 31-40.

Digital Library

[5]

Berger, A., and Lafferty, J. 1999. Information retrieval as statistical translation. In SIGIR, pp. 222-229.

Digital Library

[6]

Blei, D. M., Ng, A. Y., and Jordan, M. J. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022.

Digital Library

[7]

Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263-311.

Digital Library

[8]

Cao, G., Nie, J-Y., Gao, J., and Robertson, S. 2008. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pp. 289-305.

Digital Library

[9]

Cui, H., Wen, J-R., Nie, J-Y. and Ma, W-Y. 2002. Probabilistic query expansion using query logs. In WWW, pp. 325-332.

Digital Library

[10]

Cui, H., Wen, J-R., Nie, J-Y. and Ma, W-Y. 2003. Query expansion by mining user log. IEEE Trans on Knowledge and Data Engineering. Vol. 15, No. 4. pp. 1-11.

Digital Library

[11]

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391-407.

[12]

Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 1-38.

[13]

Ganchev, K., Graca, J., Gillenwater, J., and Taskar, B. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11 (2010): 2001-2049.

Digital Library

[14]

Gao, J., Toutanova, K., Yih., W-T. 2011. Clickthrough-based latent semantic models for web search. In SIGIR, pp. 675-684.

Digital Library

[15]

Gao, J., He, X., and Nie, J-Y. 2010. Clickthrough-based translation models for web search: from word models to phrase models. In CIKM, pp. 1139-1148.

Digital Library

[16]

Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J-Y. 2009. Smoothing clickthrough data for web search ranking. In SIGIR, pp. 355-362.

Digital Library

[17]

Gao, J., Qi, H., Xia, X., and Nie, J-Y. 2005. Linear discriminant model for information retrieval. In SIGIR, pp. 290-297.

Digital Library

[18]

Hasan, S., Ganitkevitch, J., Ney, H., and Andres-Fnerre, J. 2008. Triplet lexicon models for statistical machine translation. In EMNLP, pp. 372-381.

Digital Library

[19]

Hofmann, T. 1999. Probabilistic latent semantic indexing. In SIGIR, pp. 50-57.

Digital Library

[20]

Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C-Y. 2000. Question answering in webclopedia. In TREC 9.

[21]

Huang, J., Gao, J., Miao, J., Li, X., Wang, K., and Behr, F. 2010. Exploring web scale language models for search query processing. In WWW, pp. 451-460.

Digital Library

[22]

Jarvelin, K. and Kekalainen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pp. 41-48.

Digital Library

[23]

Jin, R., Hauptmann, A. G., and Zhai, C. 2002. Title language model for information retrieval. In SIGIR, pp. 42-48.

Digital Library

[24]

Jing, Y., and Croft., B. 1994. An association thesaurus for information retrieval. In RIAO, pp. 146-160.

[25]

Koehn, P., Och, F., and Marcu, D. 2003. Statistical phrase-based translation. In HLT/NAACL, pp. 127-133.

Digital Library

[26]

Lavrenko, V., and Croft, B. 2001. Relevance-based language models. In SIGIR, pp. 120-128.

Digital Library

[27]

Lease, M. 2009. An improved markov random field model for supporting verbose queries. In SIGIR, pp. 476-483

Digital Library

[28]

Li, Y., Hsu, P., Zhai, C., and Wang, K. 2011. Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR, pp. 285-294.

Digital Library

[29]

Metzler, D., and Croft, B. 2005. A markov random field model for term dependencies. In SIGIR, pp. 472-479.

Digital Library

[30]

Metzler, D., and Croft, B. 2007. Latent concept expansion using markov random fields. In SIGIR, pp. 311-318.

Digital Library

[31]

Och, F. 2002. Statistical machine translation: from single-word models to alignment templates. PhD thesis, RWTH Aachen.

[32]

Prager, J., Chu-Carroll, J., and Czuba, K. 2001. Use of Wordnet hypernyms for answering what is questions. In TREC 10.

[33]

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C. Cambridge Univ. Press.

[34]

Rocchio, J. 1971. Relevance feedback in information retrieval. In The SMART retrieval system: experiments in automatic document processing, pp. 313-323, Prentice-Hall Inc.

[35]

Riezler, S., Liu, Y. and Vasserman, A. 2008. Translating queries into snippets for improving query expansion. In COLING 2008. 737-744.

Digital Library

[36]

Riezler, S., and Liu, Y. 2010. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3): 569-582.

Digital Library

[37]

Shi, L., and Nie, J-Y. 2010. Modeling variable dependencies between characters in Chinese information retrieval. In AAIRS, pp. 539-551.

[38]

Tan, B. and Peng, F. 2008. Unsupervised query segmentation using generative language models and wikipedia. In WWW, pp. 347-356.

Digital Library

[39]

Wei, X., and Croft, W. B. 2006. LDA-based document models for ad-hoc retrieval. In SIGIR, pp. 178-185.

Digital Library

[40]

Wen, J., Nie, J-Y., and Zhang, H. 2002. Query clustering using user logs. ACM TOIS, 20(1): 59-81.

Digital Library

[41]

Xu, J., and Croft, B. 1996. Query expansion using local and global document analysis. In SIGIR.

Digital Library

[42]

Xue, X., Jeon, J., Croft, W. B. 2008. Retrieval models for Question and answer archives. In SIGIR, pp. 475-482.

Digital Library

[43]

Zhai, C., and Lafferty, J. 2001a. Model-based feedback in the kl-divergence retrieval model. In CIKM, pp. 403-410.

Digital Library

[44]

Zhai, C., and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pp. 334-342.

Digital Library

Cited By

R. Yousefi ZVuong TAlGhossein MRuotsalo TJaccuci GKaski S(2024)Entity Footprinting: Modeling Contextual User States via Digital Activity MonitoringACM Transactions on Interactive Intelligent Systems10.1145/364389314:2(1-27)Online publication date: 5-Feb-2024
https://dl.acm.org/doi/10.1145/3643893
Roy PSharma CGao CValegerepura KFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615466
Vuong TAndolina SJacucci GRuotsalo T(2021)Does More Context Help? Effects of Context Window and Application Source on Retrieval PerformanceACM Transactions on Information Systems10.1145/347405540:2(1-40)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1145/3474055
Show More Cited By

Towards Concept-Based Translation Models Using Search Logs for Query Expansion
1. Information systems
  1. Information retrieval

Recommendations

Probabilistic query expansion using query logs
WWW '02: Proceedings of the 11th international conference on World Wide Web

Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information retrieval. However, these previous methods do not ...
Concept based query expansion
SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval

Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper we present a probabilistic query expansion model based on a similarity thesaurus which was constructed automatically. A similarity ...
Query Expansion by Mining User Logs

Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

October 2012

2840 pages

ISBN:9781450311564

DOI:10.1145/2396761

General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

CIKM'12

Sponsor:

CIKM'12: 21st ACM International Conference on Information and Knowledge Management

October 29 - November 2, 2012

Hawaii, Maui, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
132
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

R. Yousefi ZVuong TAlGhossein MRuotsalo TJaccuci GKaski S(2024)Entity Footprinting: Modeling Contextual User States via Digital Activity MonitoringACM Transactions on Interactive Intelligent Systems10.1145/364389314:2(1-27)Online publication date: 5-Feb-2024
https://dl.acm.org/doi/10.1145/3643893
Roy PSharma CGao CValegerepura KFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615466
Vuong TAndolina SJacucci GRuotsalo T(2021)Does More Context Help? Effects of Context Window and Application Source on Retrieval PerformanceACM Transactions on Information Systems10.1145/347405540:2(1-40)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1145/3474055
Han FNiu DChen HGuo WYan SLong BGupta RLiu YShah MRajan STang JPrakash B(2020)Meta-Learning for Query Conceptualization at Web ScaleProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403357(3064-3073)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403357
Han FNiu DChen HLai KHe YXu YTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)A Deep Generative Approach to Search Extrapolation and RecommendationProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330786(1771-1779)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330786
Liu XPan SZhang QJiang YHuang XCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm ProblemProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271727(1163-1172)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271727
Yan SLin WWu TXiao DZheng XWu BLiu KChampin PGandon FMédini LLalmas MIpeirotis P(2018)Beyond Keywords and RelevanceProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186172(1919-1928)Online publication date: 10-Apr-2018
https://dl.acm.org/doi/10.1145/3178876.3186172
He YTang JOuyang HKang CYin DChang YMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Learning to Rewrite QueriesProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983835(1443-1452)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983835
Rekabsaz NLupu MHanbury AZuccon GMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Generalizing Translation Models in the Probabilistic Relevance FrameworkProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983833(711-720)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983833
Raghothaman MWei YHamadi YDillon LVisser WWilliams L(2016)SWIMProceedings of the 38th International Conference on Software Engineering10.1145/2884781.2884808(357-367)Online publication date: 14-May-2016
https://dl.acm.org/doi/10.1145/2884781.2884808
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents