research-article

A vlHMM approach to context-aware search

Authors:

Hang LiAuthors Info & Claims

ACM Transactions on the Web (TWEB), Volume 7, Issue 4

Article No.: 22, Pages 1 - 38

https://doi.org/10.1145/2490255

Published: 01 November 2013 Publication History

Abstract

Capturing the context of a user's query from the previous queries and clicks in the same session leads to a better understanding of the user's information need. A context-aware approach to document reranking, URL recommendation, and query suggestion may substantially improve users' search experience. In this article, we propose a general approach to context-aware search by learning a variable length hidden Markov model (vlHMM) from search sessions extracted from log data. While the mathematical model is powerful, the huge amounts of log data present great challenges. We develop several distributed learning techniques to learn a very large vlHMM under the map-reduce framework. Moreover, we construct feature vectors for each state of the vlHMM model to handle users' novel queries not covered by the training data. We test our approach on a raw dataset consisting of 1.9 billion queries, 2.9 billion clicks, and 1.2 billion search sessions before filtering, and evaluate the effectiveness of the vlHMM learned from the real data on three search applications: document reranking, query suggestion, and URL recommendation. The experiment results validate the effectiveness of vlHMM in the applications of document reranking, URL recommendation, and query suggestion.

References

[1]

Anagnostopoulos, A., Becchetti, L., Castillo, C., and Gionis, A. 2010. An optimization framework for query recommendation. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM'10). ACM, New York, NY, 161--170.

Digital Library

[2]

Baeza-Yates, R. A. and Tiberiet, A. 2007. Extracting semantic relations from query logs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 76--85.

Digital Library

[3]

Baeza-Yates, R. A., Hurtado, C., and Mendoza, M. 2004. Query recommendation using query logs in search engines. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT'04) Workshop on Clustering Information over the Web. 588--596.

Digital Library

[4]

Baum, L., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1, 164--171.

[5]

Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). 407--416.

Digital Library

[6]

Bilmes, J. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Tech. rep. Intetrnational Computer Science Institute, Berkley, CA.

[7]

Boldi, P., Bonchi, F., Castillo, C., Donato, D., and Vigna, S. 2009. Query suggestions using query-flow graphs. In Proceedings of the Workshop on Web Search Click Data (WSCD'09). 56--63.

Digital Library

[8]

Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. 2008. The query-flow graph: Model and applications. In Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). 609--618.

Digital Library

[9]

Borda, J. C. 1781. Mémoire sur les élections au scrution. Histoire de l'Académie Royal des Sciences.

[10]

Cao, H., Jiang, D., Pei, J., Chen, E., and Li, H. 2009. Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 191--200.

Digital Library

[11]

Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., and Li, H. 2008. Context-aware query suggestion by mining click-through and session data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).

Digital Library

[12]

Chapelle, O. and Zhang, Y. 2009. A dynamic bayesian network click model for Web search ranking. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 1--10.

Digital Library

[13]

Chu, C.-T., Kim, S. K., Lin, Y.-A., Yn, Y., Bradski, G., Ng, A. Y., and Olukotun, K. 2006. Map-reduce for machine learning on multicore. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS'06). MIT Press, Combridge, MA, 281--288.

[14]

Craswell, N. and Szummer, M. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 239--246.

Digital Library

[15]

Craswell, N., Zoeter, O., Taylor, M., and Ramsey, B. 2008. An experimental comparison of click position-bias models. In Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM'08). 87--94.

Digital Library

[16]

Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI'04). USENIX Association, Berkeley, CA.

Digital Library

[17]

Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximal likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soci. Ser B, 39, 1--38.

[18]

Deng, H., King, I., and Lyu, M. R. 2009. Entropy-biased models for query representation on the click graph. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 339--346.

Digital Library

[19]

Donato, D., Bonchi, F., Chi, T., and Maarek, Y. 2010. Do you want to take notes&quest;: Identifying research missions in Yahoo&excl; search pad. In Proceedings of the 19th International Conference on World Wide Web (WWW'10). ACM, New York, NY, 321--330.

Digital Library

[20]

Dupret, G. E. and Piwowarski, B. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'08). 331--338.

Digital Library

[21]

Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. 1999. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, U.K.

[22]

Ester, M., Kriegel, H., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). 226--231.

[23]

Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03). Society for Industrial and Applied Mathematics, Philadelphia, PA, 28--36.

Digital Library

[24]

Fonseca, B. M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., and Ziviani, N. 2005. Concept-based interactive query expansion. In Proceedings of the ACM CIKM International Conference on Information and Knowledge Management (CIKM'05). 696--703.

Digital Library

[25]

Fox, S., Karnawat, K., Mydland, M., Dumais, S., and White, T. 2005. Evaluating implicit measures to improve Web search. ACM Trans. Inf. Syst. 23, 147--168.

Digital Library

[26]

J.-Y. Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J.-Y. Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J.-Y. 2009. Smoothing clickthrough data for Web search ranking. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 355--362.

Digital Library

[27]

Guo, F., Liu, C., and Wang, Y.-M. 2009. Efficient multiple-click models in Web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09). 124--131.

Digital Library

[28]

Hassan, A., Jones, R., and Klinkner, K.-L. 2010. Beyond DCG: User behavior as a predictor of a successful search (WSDM'10). ACM, New York, NY, 221--230.

Digital Library

[29]

Huang, C., Chien, L., and Oyang, Y. 2003. Relevant term suggestion in interactive Web search based on contextual information in query session logs. J. Am. Soc. Inf. Sci. Technol. 54, 7, 638--649.

Digital Library

[30]

Frider, O. Jensen, E.C, Beitzel, S., Chowdhury, A., Frider, O. Jensen, E. C, Beitzel, S., Chowdhury, A., and Frider, O. 2006. Query phrase suggestion from topically tagged session logs. In Proceedings of the 7th International Conference on Flexible Query Answering Systems (FQAS'06). Lecture Notes in Computer Science, vol. 4027, Springer, Berlin Heidelberg, 185--196.

Digital Library

[31]

Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, New York, NY.

Digital Library

[32]

Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). 154--161.

Digital Library

[33]

Jones, R. and Klinkner, K. L. 2008. Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY, 699--708.

Digital Library

[34]

Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 387--396.

Digital Library

[35]

Kotov, A., Bennett, P., White, R., Dumais, S., and Teevan, J. 2005. Modeling and analysis of cross-session search tasks. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05).

[36]

Liao, Z., Jiang, D., Chen, E., Pei, J., Cao, H., and Li, H. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Trans. Intell. Syst. Technol. 3, 17:1--17:40.

Digital Library

[37]

Liao, Z., Song, Y., He, L.-W., and Huang, Y. 2012. Evaluating the effectiveness of search task trails. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 489--498.

Digital Library

[38]

Lucchese, C., Orlando, S., Perego, R., Silvestri, F., and Tolomei, G. 2011. Identifying task-based sessions in search engine query logs. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM, New York, NY, 277--286.

Digital Library

[39]

Mei, Q., Klinkner, K., Kumar, R., and Tomkins, A. 2009. An analysis framework for search sequences. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1991--1996.

Digital Library

[40]

Mei, Q., Zhou, D., and Church, K. 2008. Query suggestion using hitting time. In Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). 469--478.

Digital Library

[41]

Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications inspeech recognition. Proc. IEEE 77, 2, 257--286.

[42]

Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'05). ACM, New York, NY.

Digital Library

[43]

Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. 2010. Clustering query refinements by user intent. In Proceedings of the International Conference on World Wide Web (WWW'10). 841--850.

Digital Library

[44]

Shen, X., Tan, B., and Zhai, C.-X. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). ACM, New York, NY, 43--50.

Digital Library

[45]

Tong, H., Faloutsos, C., and Pan, J.-Y. 2006. Fast random walk with restart and its applications. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE Computer Society, Washington, DC, 613--622.

Digital Library

[46]

Wang, Y., Zhou, L., Feng, J., Wang, J., and Lin, Z.-Q. 2006. Mining complex time-series data by learning Markovian models. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE Computer Society, Washington, DC, 1136--1140.

Digital Library

[47]

Wen, J., Nie, J., and Zhang, H. 2001. Clustering user queries of a search engine. In Proceedings of the 10th International Conference on World Wide Web (WWW'01). 162--168.

Digital Library

[48]

White, R. W., Bailey, P., and Chen, L. 2009. Predicting user interests from contextual information. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 363--370.

Digital Library

[49]

White, R. W., Bennett, P. N., and Dumais, S. T. 2010. Predicting short-term interests using activity-based search context. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). 1009--1018.

Digital Library

[50]

White, R. W., Bilenko, M., and Cucerzan, S. 2007. Studying the use of popular destinations to enhance Web search interaction. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 159--166.

Digital Library

[51]

Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E., and Li, H. 2010. Context-aware ranking in Web search. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). ACM, 451--458.

Digital Library

[52]

Zhao, M., Li, H., Ratnaparkhi, A., Hon, H.-W., and Wang, J. 2006. Adapting document ranking to users preferences using click-through data. In Proceedings of the Asia Information Retrieval Symposium (AIRS'06). 26--42.

Digital Library

Cited By

Bi KMetrikov PLi CByun B(2021)Leveraging User Behavior History for Personalized Email SearchProceedings of the Web Conference 202110.1145/3442381.3450110(2858-2868)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3450110
Huang JWang HZhang WLiu T(2020)Multi-Task Learning for Entity Recommendation and Document Ranking in Web SearchACM Transactions on Intelligent Systems and Technology10.1145/339650111:5(1-24)Online publication date: 26-Jul-2020
https://dl.acm.org/doi/10.1145/3396501
Liao ZSong YZhou D(2020)Query SuggestionQuery Understanding for Search Engines10.1007/978-3-030-58334-7_8(171-203)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_8
Show More Cited By

Index Terms

A vlHMM approach to context-aware search
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Context-aware ranking in web search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

The context of a search query often provides a search engine meaningful hints for answering the current query better. Previous studies on context-aware search were either focused on the development of context models or limited to a relatively small ...
Towards context-aware search by learning a very large variable length hidden markov model from search logs
WWW '09: Proceedings of the 18th international conference on World wide web

Capturing the context of a user's query from the previous queries and clicks in the same session may help understand the user's information need. A context-aware approach to document re-ranking, query suggestion, and URL recommendation may improve users'...
Towards context-aware search with right click
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Many queries are submitted to search engines by right-clicking the marked text (i.e., the query) in Web browsers. Because the document being read by the searcher often provides sufficient contextual information for the query, search engine could provide ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web

ACM Transactions on the Web Volume 7, Issue 4

October 2013

220 pages

ISSN:1559-1131

EISSN:1559-114X

DOI:10.1145/2540635

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2013

Accepted: 01 May 2013

Revised: 01 February 2012

Received: 01 June 2011

Published in TWEB Volume 7, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Sciences and Engineering Research Council of Canada
National Natural Science Foundation of China
Microsoft Research
BCFRST NRAS Endowment Research Team Program project
Networks of Centres of Excellence of Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bi KMetrikov PLi CByun B(2021)Leveraging User Behavior History for Personalized Email SearchProceedings of the Web Conference 202110.1145/3442381.3450110(2858-2868)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3450110
Huang JWang HZhang WLiu T(2020)Multi-Task Learning for Entity Recommendation and Document Ranking in Web SearchACM Transactions on Intelligent Systems and Technology10.1145/339650111:5(1-24)Online publication date: 26-Jul-2020
https://dl.acm.org/doi/10.1145/3396501
Liao ZSong YZhou D(2020)Query SuggestionQuery Understanding for Search Engines10.1007/978-3-030-58334-7_8(171-203)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_8
Deng HChang Y(2020)An Introduction to Query UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_1(1-13)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_1
Dietz F(2020)The Curious Case of Session IdentificationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-58219-7_6(69-74)Online publication date: 22-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-58219-7_6
Singh V(2019)Predicting Search Intent Based on In-Search Context for Exploratory SearchInternational Journal of Advanced Pervasive and Ubiquitous Computing10.4018/IJAPUC.201907010411:3(53-75)Online publication date: Jul-2019
https://doi.org/10.4018/IJAPUC.2019070104
Zamani HBendersky MWang XZhang MBarrett RCummings RAgichtein EGabrilovich E(2017)Situational Context for Ranking in Personal SearchProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052648(1531-1540)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3038912.3052648
Kaveh-Yazdy FZareh-Bidoki A(2014)Aleph or Aleph-Maddah, that is the question! Spelling correction for search engine autocomplete service2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)10.1109/ICCKE.2014.6993359(273-278)Online publication date: Oct-2014
https://doi.org/10.1109/ICCKE.2014.6993359
Giakoumis DTzovaras D(2014)A Novel Probabilistic Framework to Broaden the Context in Query Recommendation SystemsArtificial Intelligence: Methods and Applications10.1007/978-3-319-07064-3_51(589-602)Online publication date: 2014
https://doi.org/10.1007/978-3-319-07064-3_51

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents