Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458176acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Published: 26 October 2008 Publication History

Abstract

Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine's log, annotated by human editors, learning that 17% of tasks are interleaved, and 20% are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users' query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97% for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.

References

[1]
Comscore announces new "visits" metric for measuring user engagement, 2007. http://www.comscore.com/press/release.asp?press=1246.
[2]
P. Anick. Using terminological feedback for web search refinement - a log-based study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 88--95, 2003.
[3]
P. G. Anick. Automatic Construction of Faceted Terminological Feedback for Context-Based Information Retrieval. PhD thesis, Brandeis University, 1999.
[4]
L. Catledge and J. Pitkow. Characterizing browsing strategies in the world-wide web. In Proceedings of the Third International World-Wide Web Conference on Technology, tools and applications, volume 27, 1995.
[5]
D. Downey, S. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and applications. Journal of the American Society for Information Science and Technology (JASIST), 58(6):862--871, 2007.
[6]
D. He, A. Goker, and D. J. Harper. Combining evidence for automatic web session identification. Information Processing and Management, 38:727--742, 2002.
[7]
S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), pages 567--574, 2007.
[8]
B. J. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2000.
[9]
T. Lau and E. Horvitz. Patterns of search: Analyzing and modeling web query refinement. In A. Press, editor, Proceedings of the Seventh International Conference on User Modeling, 1999.
[10]
C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[11]
A. Montgomery and C. Faloutsos. Identifying web browsing trends and patterns. IEEE Computer, 34(7):94--95, 2007.
[12]
H. C. Ozmutlu and F. Cavdur. Application of automatic topic identification on excite web search engine data logs. Information Processing and Management, 41(5):1243--1262, 2005.
[13]
H. C. Ozmutlu, F. Cavdur, A. Spink, and S. Ozmutlu. Investigating the performance of automatic new topic identification across multiple datasets. In Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST) 43, Austin (US), 2006.
[14]
S. Ozmutlu. Automatic new topic identification using multiple linear regression. Information Processing and Management, 42(4):934--950, 2006.
[15]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In R. Grossman, R. Bayardo, and K. P. Bennett, editors, KDD, pages 239--248. ACM, 2005.
[16]
B. W. Silverman. Density Estimation. Chapman and Hall, London.
[17]
C. Silverstein, M. R. Henzinger, H. Marais, and M. Moricz. Analysis of a very large web search engine query log. ACM SIGIR Forum, 33(1):6--12, 1999.
[18]
A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of query reformulation and relevance feedback by Excite users. Internet Research: Electronic Networking Applications and Policy, 10(4):317--328, 2000.
[19]
A. Spink, M. Park, B. J. Jansen, and J. Pedersen. Multitasking during web search sessions. Inf. Process. Manage., 42(1):264--275, 2006.
[20]
J. Teevan, E. Adar, R. Jones, and M. Potts. History repeats itself: Repeat queries in Yahoo's logs. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 703--704, 2006.

Cited By

View all
  • (2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
  • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
  • (2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query log segmentation
    2. query session
    3. query session boundary detection
    4. search goal

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
    • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
    • (2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
    • (2024)Improving searcher struggle detection via the reversal theoryDiscover Computing10.1007/s10791-024-09492-z27:1Online publication date: 19-Dec-2024
    • (2024)Long short-term search session-based document re-ranking modelKnowledge and Information Systems10.1007/s10115-024-02205-467:1(223-245)Online publication date: 9-Sep-2024
    • (2023)Distributionally-Informed Recommender System EvaluationACM Transactions on Recommender Systems10.1145/36134552:1(1-27)Online publication date: 5-Aug-2023
    • (2023)Taking Search to TaskProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578288(1-13)Online publication date: 19-Mar-2023
    • (2023)Representing Tasks with a Graph-Based Method for Supporting Users in Complex Search TasksProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578279(378-382)Online publication date: 19-Mar-2023
    • (2023)A Prompt Log Analysis of Text-to-Image Generation SystemsProceedings of the ACM Web Conference 202310.1145/3543507.3587430(3892-3902)Online publication date: 30-Apr-2023
    • (2023)Session Search with Pre-trained Graph Classification ModelProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591766(953-962)Online publication date: 19-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media