Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2396761.2530275acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Towards Concept-Based Translation Models Using Search Logs for Query Expansion

Published: 29 October 2012 Publication History

Abstract

Query logs have been successfully used to improve Web search. One of the directions exploits user clickthrough data to extract related terms to a query to perform query expansion (QE). How-ever, term relations have been created between isolated terms without considering their context, giving rise to the problem of term ambiguity. To solve this problem, we propose several ways to place terms in their contexts. On the one hand, contiguous terms can form a phrase; and on the other hand, terms at proximi-ty can provide less strict but useful contextual constraints mutual-ly. Relations extracted between such more constrained groups of terms are expected to be less noisy than those between single terms. In this paper, the constrained groups of terms are called concepts. We exploit user query logs to build statistical translation models between concepts, which are then used for QE.
We perform experiments on the Web search task using a real world data set. Results show that the concept-based statistical translation model trained on clickthrough data outperforms signif-icantly other state-of-the-art QE systems.

References

[1]
Agichtein, E., Brill, E., and Dumais, S. 2006. Improving web search ranking by incorporating user behavior information. In SIGIR, pp. 19-26.
[2]
Baeze-Yates, R., and Ribeiro-Neto, B. 2011. Modern Information Retrieval. Addison-Wesley.
[3]
Bai, J., Song, D., Bruza, P., Nie, J-Y., and Cao, G. 2005. Query expansion using term relationships in language models for information retrieval. In CIKM, pp. 688-695.
[4]
Bendersky, M., Metzler, D., and Croft, B. 2010. Learning concept importance using a weighted dependence model. In WSDM, pp. 31-40.
[5]
Berger, A., and Lafferty, J. 1999. Information retrieval as statistical translation. In SIGIR, pp. 222-229.
[6]
Blei, D. M., Ng, A. Y., and Jordan, M. J. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022.
[7]
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2): 263-311.
[8]
Cao, G., Nie, J-Y., Gao, J., and Robertson, S. 2008. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR, pp. 289-305.
[9]
Cui, H., Wen, J-R., Nie, J-Y. and Ma, W-Y. 2002. Probabilistic query expansion using query logs. In WWW, pp. 325-332.
[10]
Cui, H., Wen, J-R., Nie, J-Y. and Ma, W-Y. 2003. Query expansion by mining user log. IEEE Trans on Knowledge and Data Engineering. Vol. 15, No. 4. pp. 1-11.
[11]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391-407.
[12]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 1-38.
[13]
Ganchev, K., Graca, J., Gillenwater, J., and Taskar, B. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11 (2010): 2001-2049.
[14]
Gao, J., Toutanova, K., Yih., W-T. 2011. Clickthrough-based latent semantic models for web search. In SIGIR, pp. 675-684.
[15]
Gao, J., He, X., and Nie, J-Y. 2010. Clickthrough-based translation models for web search: from word models to phrase models. In CIKM, pp. 1139-1148.
[16]
Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J-Y. 2009. Smoothing clickthrough data for web search ranking. In SIGIR, pp. 355-362.
[17]
Gao, J., Qi, H., Xia, X., and Nie, J-Y. 2005. Linear discriminant model for information retrieval. In SIGIR, pp. 290-297.
[18]
Hasan, S., Ganitkevitch, J., Ney, H., and Andres-Fnerre, J. 2008. Triplet lexicon models for statistical machine translation. In EMNLP, pp. 372-381.
[19]
Hofmann, T. 1999. Probabilistic latent semantic indexing. In SIGIR, pp. 50-57.
[20]
Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C-Y. 2000. Question answering in webclopedia. In TREC 9.
[21]
Huang, J., Gao, J., Miao, J., Li, X., Wang, K., and Behr, F. 2010. Exploring web scale language models for search query processing. In WWW, pp. 451-460.
[22]
Jarvelin, K. and Kekalainen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pp. 41-48.
[23]
Jin, R., Hauptmann, A. G., and Zhai, C. 2002. Title language model for information retrieval. In SIGIR, pp. 42-48.
[24]
Jing, Y., and Croft., B. 1994. An association thesaurus for information retrieval. In RIAO, pp. 146-160.
[25]
Koehn, P., Och, F., and Marcu, D. 2003. Statistical phrase-based translation. In HLT/NAACL, pp. 127-133.
[26]
Lavrenko, V., and Croft, B. 2001. Relevance-based language models. In SIGIR, pp. 120-128.
[27]
Lease, M. 2009. An improved markov random field model for supporting verbose queries. In SIGIR, pp. 476-483
[28]
Li, Y., Hsu, P., Zhai, C., and Wang, K. 2011. Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR, pp. 285-294.
[29]
Metzler, D., and Croft, B. 2005. A markov random field model for term dependencies. In SIGIR, pp. 472-479.
[30]
Metzler, D., and Croft, B. 2007. Latent concept expansion using markov random fields. In SIGIR, pp. 311-318.
[31]
Och, F. 2002. Statistical machine translation: from single-word models to alignment templates. PhD thesis, RWTH Aachen.
[32]
Prager, J., Chu-Carroll, J., and Czuba, K. 2001. Use of Wordnet hypernyms for answering what is questions. In TREC 10.
[33]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C. Cambridge Univ. Press.
[34]
Rocchio, J. 1971. Relevance feedback in information retrieval. In The SMART retrieval system: experiments in automatic document processing, pp. 313-323, Prentice-Hall Inc.
[35]
Riezler, S., Liu, Y. and Vasserman, A. 2008. Translating queries into snippets for improving query expansion. In COLING 2008. 737-744.
[36]
Riezler, S., and Liu, Y. 2010. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3): 569-582.
[37]
Shi, L., and Nie, J-Y. 2010. Modeling variable dependencies between characters in Chinese information retrieval. In AAIRS, pp. 539-551.
[38]
Tan, B. and Peng, F. 2008. Unsupervised query segmentation using generative language models and wikipedia. In WWW, pp. 347-356.
[39]
Wei, X., and Croft, W. B. 2006. LDA-based document models for ad-hoc retrieval. In SIGIR, pp. 178-185.
[40]
Wen, J., Nie, J-Y., and Zhang, H. 2002. Query clustering using user logs. ACM TOIS, 20(1): 59-81.
[41]
Xu, J., and Croft, B. 1996. Query expansion using local and global document analysis. In SIGIR.
[42]
Xue, X., Jeon, J., Croft, W. B. 2008. Retrieval models for Question and answer archives. In SIGIR, pp. 475-482.
[43]
Zhai, C., and Lafferty, J. 2001a. Model-based feedback in the kl-divergence retrieval model. In CIKM, pp. 403-410.
[44]
Zhai, C., and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pp. 334-342.

Cited By

View all
  • (2024)Entity Footprinting: Modeling Contextual User States via Digital Activity MonitoringACM Transactions on Interactive Intelligent Systems10.1145/364389314:2(1-27)Online publication date: 5-Feb-2024
  • (2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
  • (2021)Does More Context Help? Effects of Context Window and Application Source on Retrieval PerformanceACM Transactions on Information Systems10.1145/347405540:2(1-40)Online publication date: 27-Sep-2021
  • Show More Cited By
  1. Towards Concept-Based Translation Models Using Search Logs for Query Expansion

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Entity Footprinting: Modeling Contextual User States via Digital Activity MonitoringACM Transactions on Interactive Intelligent Systems10.1145/364389314:2(1-27)Online publication date: 5-Feb-2024
    • (2023)Deep Query Rewriting For GeocodingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615466(4801-4807)Online publication date: 21-Oct-2023
    • (2021)Does More Context Help? Effects of Context Window and Application Source on Retrieval PerformanceACM Transactions on Information Systems10.1145/347405540:2(1-40)Online publication date: 27-Sep-2021
    • (2020)Meta-Learning for Query Conceptualization at Web ScaleProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403357(3064-3073)Online publication date: 23-Aug-2020
    • (2019)A Deep Generative Approach to Search Extrapolation and RecommendationProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330786(1771-1779)Online publication date: 25-Jul-2019
    • (2018)Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm ProblemProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271727(1163-1172)Online publication date: 17-Oct-2018
    • (2018)Beyond Keywords and RelevanceProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186172(1919-1928)Online publication date: 10-Apr-2018
    • (2016)Learning to Rewrite QueriesProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983835(1443-1452)Online publication date: 24-Oct-2016
    • (2016)Generalizing Translation Models in the Probabilistic Relevance FrameworkProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983833(711-720)Online publication date: 24-Oct-2016
    • (2016)SWIMProceedings of the 38th International Conference on Software Engineering10.1145/2884781.2884808(357-367)Online publication date: 14-May-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media