Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Query clustering using user logs

Published: 01 January 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Query clustering is a process used to discover frequently asked questions or most popular topics on a search engine. This process is crucial for search engines based on question-answering. Because of the short lengths of queries, approaches based on keywords are not suitable for query clustering. This paper describes a new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query. The similarity between two queries may be deduced from the common documents the users selected for them. Our experiments show that a combination of both keywords and user logs is better than using either method alone.

    References

    [1]
    BEEFERMAN,D.AND BERGER, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, (August). Acm Press, New York, NY, 407-416.
    [2]
    DUBES,R.C.AND JAIN, A. K. 1988. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ.
    [3]
    ESTER, M., KRIEGEL, H., SANDER,J.,AND XU, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226-231.
    [4]
    ESTER, M., KRIEGEL, H., SANDER, J., WIMMER, M., AND XU, X. 1998. Incremental clustering for mining in a data warehousing environment. In Proceedings of the 24th International Conference on Very Large Data Bases, 323-333.
    [5]
    FITZPATRICK,L.AND DENT, M. 1997. Automatic feedback using past queries: social searching? In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 306-312.
    [6]
    DE LIMA,E.AND PEDERSEN, J. 1999. Phrases recognition and expansion for short, precisionbiased queries based on a query log. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 145-152.
    [7]
    GARFIELD, E. 1983. Citation Indexing: Its Theory and Application in Science, Technology and Humanities, 2nd ed. The ISI Press, Philadelphia, PA.
    [8]
    GUSFIELD, D. 1997. Inexact matching, sequence alignment, and dynamic programming. In Algorithms on Strings, Trees, and Sequences Computer Science and Computational Biology, Cambridge University Press.
    [9]
    KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. In American Documentation, 14, 1, 10-25.
    [10]
    KLEINBERG, J. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM SIAM International Symposium on Discrete Algorithms. ACM Press, New York, NY, 668-677.
    [11]
    KULYUKIN, V. A., HAMMOND,K.J.,AND BURKE, R. D. 1998. Answering questions for an organization online. In Proceedings of AAAI 98. 532-538.
    [12]
    LEWIS,D.D.AND CROFT, W. B. 1990. Term clustering of syntactic phrases. In Proceedings of the 13th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 385-404.
    [13]
    LU,Z.AND MCKINLEY, K. 2000. Partial collection replication versus caching for information retrieval systems. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 248-255.
    [14]
    MILLER, G. A., ED. 1990. WordNet: an on-line lexical database, Int. J. Lexico. 3,4.
    [15]
    NG,R.AND HAN, J. 1994. Efficient and effective clustering method for spatial data mining. In Proceedings of the 20th International Conference on Very Large Data Bases. 144-155.
    [16]
    PORTER, M. 1980. An algorithm for suffix stripping. Program, 14, 3, 130-137.
    [17]
    SALTON,G.AND MCGILL, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill New York, NY.
    [18]
    SRIHARI,R.AND LI, W. 1999. Question answering supported by information extraction. In Proceedings of TREC8, 75-85.
    [19]
    VAN RIJSBERGEN, C. J. 1979. Information Retrieval. 2nd ed, Butterworths, London.
    [20]
    VOORHEES, E., GUPTA,N.K.,AND JOHNSON-LAIRD, B. 1995. Learning collection fusion strategies. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 172-179.
    [21]
    XU,J.AND CROFT, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 4-11.

    Cited By

    View all
    • (2024)Intent Identification by Semantically Analyzing the Search QueryModelling10.3390/modelling50100165:1(292-314)Online publication date: 22-Feb-2024
    • (2023)The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web ArchivesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591890(2848-2860)Online publication date: 19-Jul-2023
    • (2022)Deep Neural Network and Pseudo Relevance Feedback Based Query ExpansionComputers, Materials & Continua10.32604/cmc.2022.02241171:2(3557-3570)Online publication date: 2022
    • Show More Cited By

    Recommendations

    Reviews

    Dimitrios Katsaros

    The problem of identifying frequently asked questions (FAQs) in information retrieval systems has been previously addressed by clustering the queries into groups according to their similarity. This clustering has been based either on the contents of the queries (clustering based on query content), or solely on the documents that users select as the answer to their query (clustering based on cross-references). This paper proposes a new query clustering method that is a hybrid of the aforementioned approaches. The major contribution of this work is its demonstration of the fact that exploiting only the contents of the query, or only users’ judgment, cannot give correct results in all cases. The paper shows that keyword matching fails to cluster queries effectively, because of the ambiguity of words; and clustering based on cross-references also fails, because it creates query clusters with extremely broad topics. The major weakness of the paper is that it does not demonstrate whether these findings also apply in the case of search engines for the Web, which is both the ultimate challenge and the stated objective of the paper. The results presented in the paper are drawn from a commercial online encyclopedia, and since the encyclopedia’s contents are relatively coherent compared to the Web, it is natural to discover query clusters in this limited venue. The paper is well written, and requires no special background in order to understand the ideas presented, although basic knowledge of information retrieval terminology would be very helpful. Overall, the paper is an interesting case study for query clustering, but it cannot answer the question of whether or not the proposed methods apply in an enormous, dynamic information retrieval system like the Web. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 20, Issue 1
    January 2002
    131 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/503104
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2002
    Published in TOIS Volume 20, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Query clustering
    2. search engine
    3. user log
    4. web data mining

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Intent Identification by Semantically Analyzing the Search QueryModelling10.3390/modelling50100165:1(292-314)Online publication date: 22-Feb-2024
    • (2023)The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web ArchivesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591890(2848-2860)Online publication date: 19-Jul-2023
    • (2022)Deep Neural Network and Pseudo Relevance Feedback Based Query ExpansionComputers, Materials & Continua10.32604/cmc.2022.02241171:2(3557-3570)Online publication date: 2022
    • (2021)Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web SearchResearch Anthology on Multi-Industry Uses of Genetic Programming and Algorithms10.4018/978-1-7998-8048-6.ch034(656-675)Online publication date: 2021
    • (2021)Event-based Product Carousel Recommendation with Query-Click Graph2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671649(4119-4125)Online publication date: 15-Dec-2021
    • (2020)Evaluating the Effectiveness of Query-Document Clustering Using the QDSM MeasureAdvances in Science, Technology and Engineering Systems Journal10.25046/aj05061055:6(883-893)Online publication date: Dec-2020
    • (2019)Boosting Search Performance Using Query VariationsACM Transactions on Information Systems10.1145/334500137:4(1-25)Online publication date: 4-Oct-2019
    • (2019)Efficient differentiable programming in a functional array-processing languageProceedings of the ACM on Programming Languages10.1145/33417013:ICFP(1-30)Online publication date: 26-Jul-2019
    • (2019)Comparing the Effectiveness of Query-Document Clusterings Using the QDSM and Cosine Similarity2019 38th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC49216.2019.8966432(1-8)Online publication date: Nov-2019
    • (2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media