Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1571941.1572001acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Entropy-biased models for query representation on the click graph

Published: 19 July 2009 Publication History

Abstract

Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-eliminated. Nor do they handle heterogeneous query-URL pairs well. In this paper, we investigate and develop a novel entropy-biased framework for modeling click graphs. The intuition behind this model is that various query-URL pairs should be treated differently, i.e., common clicks on less frequent but more specific URLs are of greater value than common clicks on frequent and general URLs. Based on this intuition, we utilize the entropy information of the URLs and introduce a new concept, namely the inverse query frequency (IQF), to weigh the importance (discriminative ability) of a click on a certain URL. The IQF weighting scheme is never explicitly explored or statistically examined for any bipartite graphs in the information retrieval literature. We not only formally define and quantify this scheme, but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation. To illustrate our methodology, we conduct experiments with the AOL query log data for query similarity analysis and query suggestion tasks. Experimental results demonstrate that considerable improvements in performance are obtained with our entropy-biased models. Moreover, our method can also be applied to other bipartite graphs.

References

[1]
R.A. Baeza-Yates, C.A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, pages 588--596, 2004.
[2]
R.A. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. In KDD, pages 76--85, 2007.
[3]
D. Beeferman and A.L. Berger. Agglomerative clustering of a search engine query log. In KDD, pages 407--416, 2000.
[4]
M. Bilenko and R.W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In WWW, pages 51--60, 2008.
[5]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007.
[6]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM, pages 87--94, 2008.
[7]
A.P. de Vries and T. Rölleke. Relevance information: a loss of entropy but a gain for idf? In SIGIR, pages 282--289, 2005.
[8]
Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. In WWW, pages 581--590, 2007.
[9]
G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR, pages 331--338, 2008.
[10]
T. Haveliwala, S. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing PageRank. Preprint, June, 2003.
[11]
G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.
[12]
K.S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972.
[13]
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, pages 387--396, 2006.
[14]
X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR, pages 339--346, 2008.
[15]
Q. Mei, D. Zhou, and K.W. Church. Query suggestion using hitting time. In CIKM, pages 469--478, 2008.
[16]
G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In Infoscale, page 1, 2006.
[17]
B. Poblete and R.A. Baeza-Yates. Query-sets: using implicit feedback and query patterns to organize web documents. In WWW, pages 41--50, 2008.
[18]
B. Poblete, C. Castillo, and A. Gionis. Dr. searcher and Mr. browser: a unified hyperlink-click graph. In CIKM, pages 1123--1132, 2008.
[19]
F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In KDD, pages 570--579, 2007.
[20]
S. Robertson. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation, 60:503--520, 2004.
[21]
T. Roelleke and J. Wang. Tf-idf uncovered: a study of theories and probabilities. In SIGIR, pages 435--442, 2008.
[22]
G. Salton and C. Buckley. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 24(5):513--23, 1988.
[23]
C.E. Shannon. Prediction and entropy of printed english. The Bell System Technical Journal, 30:50--64, 1950.
[24]
J. Teevan, S.T. Dumais, and D.J. Liebling. To personalize or not to personalize: modeling queries with variation in user intent. In SIGIR, pages 163--170, 2008.
[25]
X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, pages 87--94, 2007.
[26]
J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, pages 162--168, 2001.

Cited By

View all

Index Terms

  1. Entropy-biased models for query representation on the click graph

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
      July 2009
      896 pages
      ISBN:9781605584836
      DOI:10.1145/1571941
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 July 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. click frequency
      2. click graph
      3. entropy-biased model
      4. inverse query frequency
      5. user frequency

      Qualifiers

      • Research-article

      Conference

      SIGIR '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Navigational Searching and User TreatmentsResult Page Generation for Web Searching10.4018/978-1-7998-0961-6.ch001(1-6)Online publication date: 2021
      • (2021)VTDP: Privately Sanitizing Fine-grained Vehicle Trajectory Data with Boosted UtilityIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2019.2960336(1-1)Online publication date: 2021
      • (2020)P-Simrank: Extending Simrank to Scale-Free Bipartite NetworksProceedings of The Web Conference 202010.1145/3366423.3380081(3084-3090)Online publication date: 20-Apr-2020
      • (2020)Query SuggestionQuery Understanding for Search Engines10.1007/978-3-030-58334-7_8(171-203)Online publication date: 2-Dec-2020
      • (2019)Data-Information-Concept Continuum From a Text Mining PerspectiveEncyclopedia of Bioinformatics and Computational Biology10.1016/B978-0-12-809633-8.20408-1(586-601)Online publication date: 2019
      • (2018)Multiple Models for Recommending Temporal Aspects of EntitiesThe Semantic Web10.1007/978-3-319-93417-4_30(462-480)Online publication date: 3-Jun-2018
      • (2017)Large-Scale Location Prediction for Web PagesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.270263129:9(1902-1915)Online publication date: 1-Sep-2017
      • (2016)Individual Judgments Versus ConsensusACM Transactions on the Web10.1145/283412210:1(1-21)Online publication date: 9-Jan-2016
      • (2016)Location Aware Keyword Query Suggestion Based on Document ProximityIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.246539128:1(82-97)Online publication date: 1-Jan-2016
      • (2016)Combining Query Ambiguity and Query-URL Strength for Log-Based Query SuggestionAdvances in Swarm Intelligence10.1007/978-3-319-41009-8_64(590-597)Online publication date: 15-Jun-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media