Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2970398.2970407acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Learning to Rank User Queries to Detect Search Tasks

Published: 12 September 2016 Publication History

Abstract

We present a framework for discovering sets of web queries having similar latent needs, called search tasks, from user queries stored in a search engine log. The framework is made of two main modules: Query Similarity Learning (QSL) and Graph-based Query Clustering (GQC). The former is devoted to learning a query similarity function from a ground truth of manually-labeled search tasks. The latter represents each user search log as a graph whose nodes are queries, and uses the learned similarity function to weight edges between query pairs. Finally, search tasks are detected by clustering those queries in the graph which are connected by the strongest links, in fact by detecting the strongest connected components of the graph. To discriminate between "strong" and "weak" links also the GQC module entails a learning phase whose goal is to estimate the best threshold for pruning the edges of the graph. We discuss how the QSL module can be effectively implemented using Learning to Rank (L2R) techniques. Experiments on a real-world search engine log show that query similarity functions learned using L2R lead to better performing GQC implementations when compared to similarity functions induced by other state-of-the-art machine learning solutions, such as logistic regression and decision trees.

References

[1]
P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In CIKM'08, pages 609--618. ACM, 2008.
[2]
A. Broder. A taxonomy of web search. SIGIR Forum, 36:3--10, September 2002.
[3]
G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management, 2016.
[4]
D. Donato, F. Bonchi, T. Chi, and Y. Maarek. Do you want to take notes? identifying research missions in yahoo! search pad. In WWW '10, pages 321--330. ACM, 2010.
[5]
J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.
[6]
J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query similarity. In CIKM '11, pages 259--268. ACM, 2011.
[7]
M. Hagen, J. Gomoll, A. Beyer, and B. Stein. From search session detection to search mission detection. In OAIR'13, pages 85--92, 2013.
[8]
D. He, A. Göker, and D. J. Harper. Combining evidence for automatic web session identification. IP&M, 38:727--742, September 2002.
[9]
R. Jones and K. L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In CIKM'08. ACM, 2008.
[10]
A. Kotov, P. N. Bennett, R. W. White, S. T. Dumais, and J. Teevan. Modeling and analysis of cross-session search tasks. In SIGIR'11, pages 5--14. ACM, 2011.
[11]
H. Li. A short introduction to learning to rank. IEICE Transactions, 94-D(10):1854--1862, 2011.
[12]
L. Li, H. Deng, A. Dong, Y. Chang, and H. Zha. Identifying and labeling search tasks via query-based hawkes processes. In KDD'14, pages 731--740. ACM, 2014.
[13]
L. Li, H. Deng, Y. He, A. Dong, Y. Chang, and H. Zha. Behavior driven topic transition for search task identification. In WWW '16, pages 555--565. ACM, 2016.
[14]
T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, March 2009.
[15]
C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Identifying task-based sessions in search engine query logs. In WSDM'11, pages 277--286. ACM.
[16]
C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Discovering user tasks in long-term web search engine logs. ACM TOIS, 31(3):1--43, July 2013.
[17]
C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Modeling and predicting the task-by-task behavior of search engine users. In OAIR'13, 2013.
[18]
C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York City, NY, USA, 2008.
[19]
Q. Mei, K. Klinkner, R. Kumar, and A. Tomkins. An analysis framework for search sequences. In CIKM '09, pages 1991--1994, New York City, NY, USA, 2009. ACM.
[20]
T. Pang-Ning, M. Steinbach, V. Kumar, et al. Introduction to data mining. In Library of Congress, page 74, 2006.
[21]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05, pages 239--248. ACM, 2005.
[22]
K. Raman, P. N. Bennett, and K. Collins-Thompson. Toward whole-session relevance: exploring intrinsic diversity in web search. In SIGIR'13. ACM, 2013.
[23]
L. Rokach and O. Maimon. Top-down induction of decision trees classifiers - a survey. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 35(4):476--487, November 2005.
[24]
F. Silvestri. Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 1(1-2):1--174, January 2010.
[25]
A. Spink, M. Park, B. J. Jansen, and J. Pedersen. Multitasking during web search sessions. IP&M, 42(1):264--275, January 2006.
[26]
H. Wang, Y. Song, M.-W. Chang, X. He, R. W. White, and W. Chu. Learning to extract cross-session search tasks. In WWW'13, pages 1353--1364. ACM, 2013.
[27]
Q. Wu, C. Burges, K. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 2010.

Cited By

View all
  • (2023)Recommending tasks based on search queries and missionsNatural Language Engineering10.1017/S1351324923000219(1-25)Online publication date: 17-May-2023
  • (2017)Periodicity in User Engagement with a Search Engine and Its Application to Online Controlled ExperimentsACM Transactions on the Web10.1145/285682211:2(1-35)Online publication date: 14-Apr-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
September 2016
318 pages
ISBN:9781450344975
DOI:10.1145/2970398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. query log mining
  2. search task discovery

Qualifiers

  • Research-article

Conference

ICTIR '16
Sponsor:

Acceptance Rates

ICTIR '16 Paper Acceptance Rate 41 of 79 submissions, 52%;
Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Recommending tasks based on search queries and missionsNatural Language Engineering10.1017/S1351324923000219(1-25)Online publication date: 17-May-2023
  • (2017)Periodicity in User Engagement with a Search Engine and Its Application to Online Controlled ExperimentsACM Transactions on the Web10.1145/285682211:2(1-35)Online publication date: 14-Apr-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media