article

Query clustering using user logs

ACM Transactions on Information Systems (TOIS), Volume 20, Issue 1

Pages 59 - 81

Published: 01 January 2002 Publication History

Abstract

Query clustering is a process used to discover frequently asked questions or most popular topics on a search engine. This process is crucial for search engines based on question-answering. Because of the short lengths of queries, approaches based on keywords are not suitable for query clustering. This paper describes a new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query. The similarity between two queries may be deduced from the common documents the users selected for them. Our experiments show that a combination of both keywords and user logs is better than using either method alone.

References

[1]

BEEFERMAN,D.AND BERGER, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, (August). Acm Press, New York, NY, 407-416.

Crossref

Google Scholar

[2]

DUBES,R.C.AND JAIN, A. K. 1988. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ.

Crossref

Google Scholar

[3]

ESTER, M., KRIEGEL, H., SANDER,J.,AND XU, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226-231.

Google Scholar

[4]

ESTER, M., KRIEGEL, H., SANDER, J., WIMMER, M., AND XU, X. 1998. Incremental clustering for mining in a data warehousing environment. In Proceedings of the 24th International Conference on Very Large Data Bases, 323-333.

Crossref

Google Scholar

[5]

FITZPATRICK,L.AND DENT, M. 1997. Automatic feedback using past queries: social searching? In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 306-312.

Crossref

Google Scholar

[6]

DE LIMA,E.AND PEDERSEN, J. 1999. Phrases recognition and expansion for short, precisionbiased queries based on a query log. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 145-152.

Crossref

Google Scholar

[7]

GARFIELD, E. 1983. Citation Indexing: Its Theory and Application in Science, Technology and Humanities, 2nd ed. The ISI Press, Philadelphia, PA.

Google Scholar

[8]

GUSFIELD, D. 1997. Inexact matching, sequence alignment, and dynamic programming. In Algorithms on Strings, Trees, and Sequences Computer Science and Computational Biology, Cambridge University Press.

Google Scholar

[9]

KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. In American Documentation, 14, 1, 10-25.

Google Scholar

[10]

KLEINBERG, J. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM SIAM International Symposium on Discrete Algorithms. ACM Press, New York, NY, 668-677.

Crossref

Google Scholar

[11]

KULYUKIN, V. A., HAMMOND,K.J.,AND BURKE, R. D. 1998. Answering questions for an organization online. In Proceedings of AAAI 98. 532-538.

Crossref

Google Scholar

[12]

LEWIS,D.D.AND CROFT, W. B. 1990. Term clustering of syntactic phrases. In Proceedings of the 13th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 385-404.

Crossref

Google Scholar

[13]

LU,Z.AND MCKINLEY, K. 2000. Partial collection replication versus caching for information retrieval systems. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 248-255.

Crossref

Google Scholar

[14]

MILLER, G. A., ED. 1990. WordNet: an on-line lexical database, Int. J. Lexico. 3,4.

Google Scholar

[15]

NG,R.AND HAN, J. 1994. Efficient and effective clustering method for spatial data mining. In Proceedings of the 20th International Conference on Very Large Data Bases. 144-155.

Crossref

Google Scholar

[16]

PORTER, M. 1980. An algorithm for suffix stripping. Program, 14, 3, 130-137.

Google Scholar

[17]

SALTON,G.AND MCGILL, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill New York, NY.

Crossref

Google Scholar

[18]

SRIHARI,R.AND LI, W. 1999. Question answering supported by information extraction. In Proceedings of TREC8, 75-85.

Google Scholar

[19]

VAN RIJSBERGEN, C. J. 1979. Information Retrieval. 2nd ed, Butterworths, London.

Crossref

Google Scholar

[20]

VOORHEES, E., GUPTA,N.K.,AND JOHNSON-LAIRD, B. 1995. Learning collection fusion strategies. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 172-179.

Crossref

Google Scholar

[21]

XU,J.AND CROFT, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 4-11.

Crossref

Google Scholar

Cited By

View all

Sultana TMandal ASaha HSultan MHossain M(2024)Intent Identification by Semantically Analyzing the Search QueryModelling10.3390/modelling50100165:1(292-314)Online publication date: 22-Feb-2024
https://doi.org/10.3390/modelling5010016
Reimer JSchmidt SFröbe MGienapp LScells HStein BHagen MPotthast MChen HDuh WHuang HKato MMothe JPoblete B(2023)The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web ArchivesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591890(2848-2860)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591890
Kumar Shukla ADas S(2022)Deep Neural Network and Pseudo Relevance Feedback Based Query ExpansionComputers, Materials & Continua10.32604/cmc.2022.02241171:2(3557-3570)Online publication date: 2022
https://doi.org/10.32604/cmc.2022.022411
Show More Cited By

Index Terms

Query clustering using user logs
1. Information systems

Recommendations

Clustering user queries of a search engine
WWW '01: Proceedings of the 10th international conference on World Wide Web
Read More
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More
Query clustering using content words and user feedback
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Query clustering is crucial for automatically discovering frequently asked queries (FAQs) or most popular topics on a question-answering search engine. Due to the short length of queries, the traditional approaches based on keywords are not suitable for ...
Read More

Reviews

Reviewer: Dimitrios Katsaros

The problem of identifying frequently asked questions (FAQs) in information retrieval systems has been previously addressed by clustering the queries into groups according to their similarity. This clustering has been based either on the contents of the queries (clustering based on query content), or solely on the documents that users select as the answer to their query (clustering based on cross-references). This paper proposes a new query clustering method that is a hybrid of the aforementioned approaches. The major contribution of this work is its demonstration of the fact that exploiting only the contents of the query, or only users’ judgment, cannot give correct results in all cases. The paper shows that keyword matching fails to cluster queries effectively, because of the ambiguity of words; and clustering based on cross-references also fails, because it creates query clusters with extremely broad topics. The major weakness of the paper is that it does not demonstrate whether these findings also apply in the case of search engines for the Web, which is both the ultimate challenge and the stated objective of the paper. The results presented in the paper are drawn from a commercial online encyclopedia, and since the encyclopedia’s contents are relatively coherent compared to the Web, it is natural to discover query clusters in this limited venue. The paper is well written, and requires no special background in order to understand the ideas presented, although basic knowledge of information retrieval terminology would be very helpful. Overall, the paper is an interesting case study for query clustering, but it cannot answer the question of whether or not the proposed methods apply in an enormous, dynamic information retrieval system like the Web. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

ACM Transactions on Information Systems Volume 20, Issue 1

January 2002

131 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/503104

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2002

Published in TOIS Volume 20, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

268
Total Citations
View Citations
2,522
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sultana TMandal ASaha HSultan MHossain M(2024)Intent Identification by Semantically Analyzing the Search QueryModelling10.3390/modelling50100165:1(292-314)Online publication date: 22-Feb-2024
https://doi.org/10.3390/modelling5010016
Reimer JSchmidt SFröbe MGienapp LScells HStein BHagen MPotthast MChen HDuh WHuang HKato MMothe JPoblete B(2023)The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web ArchivesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591890(2848-2860)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591890
Kumar Shukla ADas S(2022)Deep Neural Network and Pseudo Relevance Feedback Based Query ExpansionComputers, Materials & Continua10.32604/cmc.2022.02241171:2(3557-3570)Online publication date: 2022
https://doi.org/10.32604/cmc.2022.022411
Chawla S(2021)Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web SearchResearch Anthology on Multi-Industry Uses of Genetic Programming and Algorithms10.4018/978-1-7998-8048-6.ch034(656-675)Online publication date: 2021
https://doi.org/10.4018/978-1-7998-8048-6.ch034
Ma LSinha NVajge PCho JKumar SAchan K(2021)Event-based Product Carousel Recommendation with Query-Click Graph2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671649(4119-4125)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671649
Guti´errez-Soto CPalomino MCuriel ACerda HRain F(2020)Evaluating the Effectiveness of Query-Document Clustering Using the QDSM MeasureAdvances in Science, Technology and Engineering Systems Journal10.25046/aj05061055:6(883-893)Online publication date: Dec-2020
https://doi.org/10.25046/aj0506105
Benham RMackenzie JMoffat ACulpepper J(2019)Boosting Search Performance Using Query VariationsACM Transactions on Information Systems10.1145/334500137:4(1-25)Online publication date: 4-Oct-2019
https://dl.acm.org/doi/10.1145/3345001
Shaikhha AFitzgibbon AVytiniotis DPeyton Jones S(2019)Efficient differentiable programming in a functional array-processing languageProceedings of the ACM on Programming Languages10.1145/33417013:ICFP(1-30)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1145/3341701
Gutierrez-Soto CDiaz AHubert G(2019)Comparing the Effectiveness of Query-Document Clusterings Using the QDSM and Cosine Similarity2019 38th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC49216.2019.8966432(1-8)Online publication date: Nov-2019
https://doi.org/10.1109/SCCC49216.2019.8966432
Azad HDeepak A(2019)Query expansion techniques for information retrieval: A surveyInformation Processing & Management10.1016/j.ipm.2019.05.00956:5(1698-1735)Online publication date: Sep-2019
https://doi.org/10.1016/j.ipm.2019.05.009
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Clustering user queries of a search engine

Re-ranking search results using query logs

Query clustering using content words and user feedback

Reviews

Access critical reviews of Computing literature here

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Clustering user queries of a search engine

Re-ranking search results using query logs

Query clustering using content words and user feedback

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media