Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1507509.1507511acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Analysis of long queries in a large scale search log

Published: 09 February 2009 Publication History

Abstract

We propose to use the search log to study long queries, in order to understand the types of information needs that are behind them, and to design techniques to improve search effectiveness when they are used. Long queries arise in many different applications, such as CQA (community-based question answering) and literature search, and they have been studied to some extent using TREC data. They are also, however, quite common in web search, as can be seen by looking at the distribution of query lengths in a large scale search log.
In this paper we analyze the long queries in the search log with the aim of identifying the characteristics of the most commonly occurring types of queries, and the issues involved with using them effectively in a search engine. In addition, we propose a simple yet effective method for evaluating the performance of the queries in the search log using a combination of the click data in the search log with the existing TREC corpora.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of SIGIR, pages 19--26, 2006.
[2]
F. Ahmad and G. Kondrak. Learning a spelling error model from search query logs. In Proceedings of HLT, pages 955--962, 2005.
[3]
J. Allan, B. Carterette, J. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Million query track 2007 overview. In Proceedings of TREC, 2008.
[4]
M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proceedings of SIGIR, pages 491--498, 2008.
[5]
S. Bergsma and Q. Wang. Learning Noun Phrase Query Segmentation. In Proceedings of EMNLP-CoNLL, pages 819--826, 2007.
[6]
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In Proceedings of AAAI, volume 264, 1996.
[7]
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.
[8]
S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP, pages 293--300, 2004.
[9]
D. Downey, S. Dumais, D. Liebling, and E. Horvitz. Understanding the relationship between searchers' queries and information goals. In Proceedings of CIKM, pages 449--458, 2008.
[10]
G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In Proceedings of SIGIR, pages 331--338, 2008.
[11]
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1):116--131, 2002.
[12]
J. Guo, G. Xu, H. Li, and X. Cheng. A unified and discriminative model for query refinement. In Proceedings of SIGIR, pages 379--386, 2008.
[13]
D. Hawking. Challenges in enterprise search. In CRPIT '04: Proceedings of CRPIT, pages 15--24, 2004.
[14]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of KDD, pages 133--142, 2002.
[15]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst., 25(2):7, 2007.
[16]
R. Jones and K. L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of CIKM, pages 699--708, 2008.
[17]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of WWW, pages 387--396, 2006.
[18]
G. Kumaran and J. Allan. A case for shorter queries, and helping user create them. In Proceedings of HLT, pages 220--227, 2006.
[19]
G. Kumaran and J. Allan. Effective and efficient user interaction for long queries. In Proceedings of SIGIR, pages 11--18, 2008.
[20]
T. Lau and E. Horvitz. Patterns of search: analyzing and modeling web query refinement. In Proceedings of UM, pages 119--128, 1999.
[21]
H. Ma, H. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data for query suggestion. In Proceeding of CIKM, pages 709--718, 2008.
[22]
G. Marchionini and R. White. Find what you need, understand what you find. International Journal of Human-Computer Interaction, 23(3):205--237, 2007.
[23]
M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330, 1993.
[24]
Q. Mei and K. Church. Entropy of search logs: how hard is search? with personalization? with backoff? In Proceedings of WSDM, pages 45--54, 2008.
[25]
Q. Mei, H. Fang, and C. Zhai. A study of Poisson query generation model for information retrieval. In Proceedings of SIGIR, pages 319--326, 2007.
[26]
Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proceeding of CIKM, 2008.
[27]
D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proceedings of SIGIR, pages 472--479, 2005.
[28]
G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In Proceedings of InfoScale. ACM Press, 2006.
[29]
N. Phan, P. Bailey, and R. Wilkinson. Understanding the relationship of information need specificity to search query length. In Proceedings of SIGIR, pages 709--710, 2007.
[30]
J. M. Ponte and B. W. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.
[31]
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In Proceedings of CIKM, pages 43--52, 2008.
[32]
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.
[33]
B. Tan and F. Peng. Unsupervised query segmentation using generative language models and wikipedia. In Proceeding of WWW, pages 347--356, 2008.
[34]
X. Wang and C. Zhai. Mining term association patterns from search logs for effective query reformulation. In Proceeding of CIKM, pages 479--488, 2008.
[35]
X. Wei, F. Peng, and B. Dumoulin. Analyzing web text association to disambiguate abbreviation in queries. In Proceedings of SIGIR, pages 751--752, 2008.
[36]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In Proceedings of SIGIR, pages 475--482, 2008.
[37]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2):179--214, 2004.

Cited By

View all
  • (2024)Exploratory and directed search strategies at a social science data archiveIASSIST Quarterly10.29173/iq108748:1Online publication date: 28-Mar-2024
  • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
  • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. Analysis of long queries in a large scale search log

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSCD '09: Proceedings of the 2009 workshop on Web Search Click Data
    February 2009
    95 pages
    ISBN:9781605584348
    DOI:10.1145/1507509
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. click data
    2. long queries
    3. web search

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploratory and directed search strategies at a social science data archiveIASSIST Quarterly10.29173/iq108748:1Online publication date: 28-Mar-2024
    • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
    • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
    • (2024)Prediction of the Realisation of an Information Need: An EEG StudyProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657981(2584-2588)Online publication date: 10-Jul-2024
    • (2024)Device-dependent click-through rate estimation in Google organic search results based on clicks and impressions dataAslib Journal of Information Management10.1108/AJIM-04-2023-0107Online publication date: 10-Jan-2024
    • (2022)Information Need AwarenessProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531999(610-621)Online publication date: 6-Jul-2022
    • (2021)An interactive query-based approach for summarizing scientific documentsInformation Discovery and Delivery10.1108/IDD-10-2020-012450:2(176-191)Online publication date: 14-Jun-2021
    • (2019)Exploring Video Game Searches on the WebCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3314999(1161-1170)Online publication date: 13-May-2019
    • (2019)Characterizing searches for mathematical conceptsProceedings of the 18th Joint Conference on Digital Libraries10.1109/JCDL.2019.00019(57-66)Online publication date: 2-Jun-2019
    • (2019)Users’ Search Satisfaction in Search Engine OptimizationProceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2018)10.1007/978-3-030-24643-3_124(1035-1045)Online publication date: 1-Aug-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media