Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1935826.1935875acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Identifying task-based sessions in search engine query logs

Published: 09 February 2011 Publication History

Abstract

The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a ground-truth where the queries of a given query log have been grouped in tasks. Our analysis of this ground-truth shows that users tend to perform more than one task at the same time, since about 75% of the submitted queries involve a multi-tasking activity. We formally define the Task-based Session Discovery Problem (TSDP) as the problem of best approximating the manually annotated tasks, and we propose several variants of well known clustering algorithms, as well as a novel efficient heuristic algorithm, specifically tuned for solving the TSDP. These algorithms also exploit the collaborative knowledge collected by Wiktionary and Wikipedia for detecting query pairs that are not similar from a lexical content point of view, but actually semantically related. The proposed algorithms have been evaluated on the above ground-truth, and are shown to perform better than state-of-the-art approaches, because they effectively take into account the multi-tasking behavior of users.

Supplementary Material

JPG File (wsdm2011_tolomei_itb_01.jpg)
MP4 File (wsdm2011_tolomei_itb_01.mp4)

References

[1]
P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query- flow graph: model and applications. In CIKM '08, pages 609--618. ACM, 2008.
[2]
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):310, 2002.
[3]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD '96, pages 226--231. ACM, 1996.
[4]
. Gabrilovich and S. Markovitch. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In IJCAI, pages 612, 2007.
[5]
N. S. Glance. Community search assistant. In IUI '01, pages 91--96. ACM, 2001.
[6]
D. He and A. Goker. Detecting session boundaries from web user logs. In BCS-IRSG, pages 57--66, 2000.
[7]
D. He and D. J. Harper. Combining evidence for automatic web session identification. IPM, 38(5):727--742, 2002.
[8]
B. J. Jansen and A. Spink. How are we searching the world wide web?: a comparison of nine search engine transaction logs. IPM, 42(1):248--263, 2006.
[9]
B. J. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines: Research articles. JASIST, 58(6):862--871, 2007.
[10]
A. Järvelin, A. Järvelin, and K. Järvelin. s-grams: Defining generalized n-grams for information retrieval. IPM, 43(4):1005 1019, 2007.
[11]
R. Jones and K. L. Klinkner. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In CIKM '08, pages 699--708. ACM, 2008.
[12]
T. Lau and E. Horvitz. Patterns of search: Analyzing and modeling web query refinement. In UM '99, pages 119128. Springer Wien, 1999.
[13]
C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification, chapter 11, pages 265--283. The MIT Press, May 1998.
[14]
U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW '05, pages 391--400. ACM, 2005.
[15]
M. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC '86, pages 24--26. ACM, 1986.
[16]
B. J. Jansen and A. Spink. How are we searching the world wide web?: a comparison of nine search engine transaction logs. IPM, 42(1):248--263, 2006.
[17]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281297. UC Press, 1967.
[18]
D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In AAAI '08, pages 2530, 2008.
[19]
S. Orlando and F. Silvestri. Mining query logs. In ECIR, volume 5478 of LNCS, pages 814817. Springer, 2009.
[20]
H. C. Ozmutlu and F. Cavdur. Application of automatic topic identification on excite web search engine data logs. IPM, 41(5):1243--1262, 2005.
[21]
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[22]
R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE TSMC, 19(1):17--30, 1989.
[23]
F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. In KDD '05, pages 239--248. ACM, 2005.
[24]
V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In SIGIR '95, pages 344--350. ACM, 1995.
[25]
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pages 448--453, 1995.
[26]
D. E. Rose and D. Levinson. Understanding user goals in web search. In WWW '04, pages 13--19. ACM, 2004.
[27]
G. Salton and M. J. Mcgill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1986.
[28]
X. Shen, B. Tan, and C. Zhai. Implicit user modeling for personalized search. In CIKM '05, pages 824--831. ACM, 2005.
[29]
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):612, 1999.
[30]
F. Silvestri. Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 1(1-2):1--174, 2010.

Cited By

View all
  • (2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
  • (2023)Characterization and Prediction of Mobile TasksACM Transactions on Information Systems10.1145/352271141:1(1-39)Online publication date: 9-Jan-2023
  • (2023)Recommending tasks based on search queries and missionsNatural Language Engineering10.1017/S1351324923000219(1-25)Online publication date: 17-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
February 2011
870 pages
ISBN:9781450304931
DOI:10.1145/1935826
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. query clustering
  2. query log analysis
  3. query log session detection
  4. task-based session
  5. user search intent

Qualifiers

  • Research-article

Conference

Acceptance Rates

WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
  • (2023)Characterization and Prediction of Mobile TasksACM Transactions on Information Systems10.1145/352271141:1(1-39)Online publication date: 9-Jan-2023
  • (2023)Recommending tasks based on search queries and missionsNatural Language Engineering10.1017/S1351324923000219(1-25)Online publication date: 17-May-2023
  • (2022)A Deep Learning-based Prefetching Approach to Enable Scalability for Data-intensive Applications2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020591(2716-2721)Online publication date: 17-Dec-2022
  • (2022)Landscape of Automated Log Analysis: A Systematic Literature Review and Mapping StudyIEEE Access10.1109/ACCESS.2022.315254910(21892-21913)Online publication date: 2022
  • (2021)Task Intelligence for Search and RecommendationSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S01103ED1V01Y202105ICR07413:3(1-160)Online publication date: 9-Jun-2021
  • (2021)CoST: An annotated Data Collection for Complex SearchProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481998(4455-4464)Online publication date: 26-Oct-2021
  • (2021)A Context-independent Representation of TaskProceedings of the 2021 Conference on Human Information Interaction and Retrieval10.1145/3406522.3446008(359-362)Online publication date: 14-Mar-2021
  • (2021)Identifying Queries in Instant Search LogsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463025(1692-1696)Online publication date: 11-Jul-2021
  • (2021)Hierarchical data generator based on tree-structured stick breaking process for benchmarking clustering methodsInformation Sciences10.1016/j.ins.2020.12.020554(99-119)Online publication date: Apr-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media