Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining search and browse logs for web search: A Survey

Published: 08 October 2013 Publication History

Abstract

Huge amounts of search log data have been accumulated at Web search engines. Currently, a popular Web search engine may receive billions of queries and collect terabytes of records about user search behavior daily. Beside search log data, huge amounts of browse log data have also been collected through client-side browser plugins. Such massive amounts of search and browse log data provide great opportunities for mining the wisdom of crowds and improving Web search. At the same time, designing effective and efficient methods to clean, process, and model log data also presents great challenges.
In this survey, we focus on mining search and browse log data for Web search. We start with an introduction to search and browse log data and an overview of frequently-used data summarizations in log mining. We then elaborate how log mining applications enhance the five major components of a search engine, namely, query understanding, document understanding, document ranking, user understanding, and monitoring and feedback. For each aspect, we survey the major tasks, fundamental principles, and state-of-the-art methods.

References

[1]
Agichtein, E. 2010. Inferring searcher intent. In Proceedings of the International Conference on the World Wide Web (WWW'10).
[2]
Agichtein, E., Brill, E., and Dumais, S. 2006a. Improving Web search ranking by incorporating user behavior information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). ACM, New York, NY, 19--26.
[3]
Agichtein, E., Brill, E., Dumais, S., and Ragno, R. 2006b. Learning user interaction models for predicting Web search result preferences. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). ACM, New York, NY, 3--10.
[4]
Backstrom, L., Kleinberg, J., Kumar, R., and Novak, J. 2008. Spatial variation in search engine queries. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, NY, 357--366.
[5]
Baeza-Yates, R. 2004. Web mining in search engines. In Proceedings of the 27th Australesion Conference on Computer Science.
[6]
Baeza-Yates, R. A., Calderón-Benavides, L., and González-Caro, C. N. 2006. The intention behind Web queries. In Proceedings of the 13th International Conference on String Processing and Information Retrieval. 98--109.
[7]
Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[8]
Barbaro, M. and Zeller, T. J. 2006. A face is exposed for AOL searcher no. 4417749. The New York Times.
[9]
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). ACM, New York, NY, 407--416.
[10]
Beitzel, S. M., Jensen, E. C., Chowdhury, A., Frieder, O., and Grossman, D. 2007. Temporal analysis of a very large topically categorized web query log. J. Ame. Soc. Inf. Sci. Technol. 58, 166--178.
[11]
Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., and Frieder, O. 2004. Hourly analysis of a very large topically categorized web query log. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'04). ACM, New York, NY, 321--328.
[12]
Beitzel, S. M., Jensen, E. C., Lewis, D. D., Chowdhury, A., and Frieder, O. 2007. Automatic classification of Web queries using very large unlabeled query logs. ACM Trans. Info. Syst. 25, 2.
[13]
Bendersky, M. and Croft, W. B. 2009. Analysis of long queries in a large scale search log. In Proceedings of the Workshop on Web Search Click Data (WSCD'09). ACM, New York, NY, 8--14.
[14]
Bennett, P. N., White, R. W., Chu, W., Dumais, S. T., Bailey, P., Borisyuk, F., and Cui, X. 2012. Modeling the impact of short- and long-term behavior on search personalization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 185--194.
[15]
Bergsma, S. and Wang, Q. I. 2007. Learning noun phrase query segmentation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 819--826.
[16]
Berry, M. W. 2003. Survey of Text Mining. Springer-Verlag, New York, NY.
[17]
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. 2008. The query-flow graph: Model and applications. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY, 609--618.
[18]
Broder, A. 2002. A taxonomy of web search. SIGIR Forum 36, 3--10.
[19]
Cao, H., Jiang, D., Pei, J., Chen, E., and Li, H. 2009. Towards context-aware search by learning a very large variable length hidden Markov model from search logs. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York, NY, 191--200.
[20]
Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., and Li, H. 2008. Context-aware query suggestion by mining click-through and session data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08). ACM, New York, NY, 875--883.
[21]
Chapelle, O. and Zhang, Y. 2009. A dynamic bayesian network click model for Web search ranking. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York, NY, 1--10.
[22]
Chien, S. and Immorlica, N. 2005. Semantic similarity between search engine queries using temporal correlation. In Proceedings of the 14th International Conference on World Wide Web (WWW'05). ACM, New York, NY, 2--11.
[23]
Cleverdon, C. 1967. The cranfield tests on index language devices. Aslib Proc. 19, 173--192.
[24]
Craswell, N. and Szummer, M. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). ACM, New York, NY, 239--246.
[25]
Craswell, N., Zoeter, O., Taylor, M., and Ramsey, B. 2008. An experimental comparison of click position-bias models. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'08). ACM, New York, NY, 87--94.
[26]
Croft, W. B., Bendersky, M., Li, H., and Xu, G. 2010. Query representation and understanding workshop. SIGIR Forum 44, 2, 48--53.
[27]
Croft, W. B., Metzler, D., and Strohman, T. 2009. Search Engines - Information Retrieval in Practice. Pearson Education, Lodon, U.K.
[28]
Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th International Conference on World Wide Web (WWW'02). ACM, New York, NY, 325--332.
[29]
Dean, J. and Ghemawat, S. 2004. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design & Implementation. USENIX Association, Berkeley, CA, 10--10.
[30]
Dean, J. and Ghemawat, S. 2008. Mapreduce: Simplified data processing on large clusters. Comm. ACM 51, 107--113.
[31]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 391--407.
[32]
Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Zhang, R., Buchner, K., Liao, C., and Diaz, F. 2010. Towards recency ranking in Web search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM'10). ACM, New York, NY, 11--20.
[33]
Dou, Z., Song, R., and Wen, J.-R. 2007. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM, New York, NY, 581--590.
[34]
Dou, Z., Song, R., Yuan, X., and Wen, J.-R. 2008. Are click-through data adequate for learning Web search rankings? In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY, 73--82.
[35]
Duan, H. and Hsu, B.-J. P. 2011. Online spelling correction for query completion. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM, New York, NY, 117--126.
[36]
Dupret, G. E. and Piwowarski, B. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, NY, 331--338.
[37]
Fonseca, B. M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., and Ziviani, N. 2005. Concept-based interactive query expansion. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05). ACM, New York, NY, 696--703.
[38]
Fox, S., Karnawat, K., Mydland, M., Dumais, S., and White, T. 2005. Evaluating implicit measures to improve Web search. ACM Trans. Inf. Syst. 23, 147--168.
[39]
Fuxman, A., Tsaparas, P., Achan, K., and Agrawal, R. 2008. Using the wisdom of the crowds for keyword generation. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, NY, 61--70.
[40]
Gao, J., Yuan, W., Li, X., Deng, K., and Nie, J.-Y. 2009. Smoothing clickthrough data for Web search ranking. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). ACM, New York, NY, 355--362.
[41]
Guo, F., Liu, C., Kannan, A., Minka, T., Taylor, M. J., Wang, Y. M., and Faloutsos, C. 2009. Click chain model in Web search. In Proceedings of the 18th International Conference on World Wide Web. 11--20.
[42]
Guo, F., Liu, C., and Wang, Y. M. 2009. Efficient multiple-click models in Web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09). ACM, New York, NY, 124--131.
[43]
Guo, J., Xu, G., Li, H., and Cheng, X. 2008. A unified and discriminative model for query refinement. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, NY, 379--386.
[44]
Hagen, M., Potthast, M., Beyer, A., and Stein, B. 2012. Towards optimum query segmentation: In doubt without. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 1015--1024.
[45]
Hagen, M., Potthast, M., Stein, B., and Bräutigam, C. 2011. Query segmentation revisited. In Proceedings of the 20th International Conference on World Wide Web. 97--106.
[46]
Hassan, A. 2012. A semi-supervised approach to modeling Web search satisfaction. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 275--284.
[47]
Hassan, A., Jones, R., and Klinkner, K. L. 2010. Beyond DCG: User behavior as a predictor of a successful search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM'10). ACM, New York, NY, 221--230.
[48]
Haveliwala, T. H. 2002. Topic-sensitive pagerank. In Proceedings of the 11th International Conference on World Wide Web (WWW'02). ACM, New York, NY, 517--526.
[49]
He, D., Göker, A., and Harper, D. J. 2002. Combining evidence for automatic Web session identification. Inf. Process. Manage. 38, 727--742.
[50]
Hölscher, C. and Strube, G. 2000. Web search behavior of internet experts and newbies. In Proceedings of the 9th International World Wide Web Conference on Computer Networks. North-Holland Publishing Co., Amsterdam, The Netherlands, 337--346.
[51]
Hu, B., Zhang, Y., Chen, W., Wang, G., and Yang, Q. 2011. Characterizing search intent diversity into click models. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM, New York, NY, 17--26.
[52]
Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., and Chen, Z. 2009. Understanding user's query intent with wikipedia. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York, NY, 471--480.
[53]
Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., and Zheng, Q. 2012. Mining query subtopics from search log data. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 305--314.
[54]
Huang, C.-K., Chien, L.-F., and Oyang, Y.-J. 2003. Relevant term suggestion in interactive Web search based on contextual information in query session logs. J. Amer. Soc. Inf. Sci. Technol. 54, 7, 638--649.
[55]
Jansen, B. J. and Pooch, U. W. 2001. A review of web searching studies and a framework for future research. J. Amer. Soc. Inf. Sci. Technol. 52, 3, 235--246.
[56]
Jansen, B. J. and Spink, A. 2006. How are we searching the World Wide Web?: A comparison of nine search engine transaction logs. Inf. Process. Manage. 42, 248--263.
[57]
Jansen, B. J., Spink, A., Blakely, C., and Koshman, S. 2007. Defining a session on Web search engines: Research articles. J. Amer. Soci. Inf. Sci. Technol. 58, 862--871.
[58]
Ji, M., Yan, J., Gu, S., Han, J., He, X., Zhang, W. V., and Chen, Z. 2011. Learning search tasks in queries and web pages via graph regularization. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, New York, NY, 55--64.
[59]
Jin, X., Zhou, Y., and Mobasher, B. 2004. Web usage mining based on probabilistic latent semantic analysis. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM, New York, NY, 197--205.
[60]
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, New York, NY, 133--142.
[61]
Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). ACM, New York, NY, 154--161.
[62]
Jones, K. S., Walker, S., and Robertson, S. 1998. Probabilistic model of information retrieval: Development and status. Tech. rep. TR-446, Cambridge University Computer Laboratory.
[63]
Jones, R. and Klinkner, K. L. 2008. Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY, 699--708.
[64]
Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 387--396.
[65]
Kang, D., Jiang, D., Pei, J., Liao, Z., Sun, X., and Choi, H.-J. 2011. Multidimensional mining of large-scale search logs: A topic-concept cube approach. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 385--394.
[66]
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 604--632.
[67]
Kulkarni, A., Teevan, J., Svore, K. M., and Dumais, S. T. 2011. Understanding temporal query dynamics. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM, New York, NY, 167--176.
[68]
Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). ACM, New York, NY, 111--119.
[69]
Lathauwer, L. D., Moor, B. D., and Vandewalle, J. 2000. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 4, 1253--1278.
[70]
Lee, U., Liu, Z., and Cho, J. 2005. Automatic identification of user goals in web search. In Proceedings of the 14th International Conference on World Wide Web (WWW'05). ACM, New York, NY, 391--400.
[71]
Li, H. 2011. Learning to rank for information retrieval and natural language processing. Synthesis Lect. Human Lan. Technol. 4, 1, 1--113.
[72]
Li, X., Wang, Y.-Y., and Acero, A. 2008. Learning query intent from regularized click graphs. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, NY, 339--346.
[73]
Li, Y., Duan, H., and Zhai, C. 2012. A generalized hidden Markov model with discriminative training for query spelling correction. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 611--620.
[74]
Li, Y., Hsu, B.-J. P., Zhai, C., and Wang, K. 2011. Unsupervised query segmentation using clickthrough for information retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, New York, NY, 285--294.
[75]
Liao, Z., Jiang, D., Chen, E., Pei, J., Cao, H., and Li, H. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Trans. Intell. Syst. Technol. 3, 1, 17.
[76]
Liao, Z., Song, Y., He, L.-w., and Huang, Y. 2012. Evaluating the effectiveness of search task trails. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 489--498.
[77]
Liu, C., Guo, F., and Faloutsos, C. 2009. Bbm: Bayesian browsing model from petabyte-scale data. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). ACM, New York, NY, 537--546.
[78]
Liu, F., Yu, C., and Meng, W. 2002. Personalized Web search by mapping user queries to categories. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM'02). ACM, New York, NY, 558--565.
[79]
Liu, Y., Gao, B., Liu, T.-Y., Zhang, Y., Ma, Z., He, S., and Li, H. 2008. Browserank: Letting Web users vote for page importance. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, NY, 451--458.
[80]
Lucchese, C., Orlando, S., Perego, R., Silvestri, F., and Tolomei, G. 2011. Identifying task-based sessions in search engine query logs. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM, New York, NY, 277--286.
[81]
Matthijs, N. and Radlinski, F. 2011. Personalizing Web search using long term browsing history. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 25--34.
[82]
Mei, Q. and Church, K. 2008. Entropy of search logs: How hard is search? With personalization? With backoff? In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'08). ACM, New York, NY, 45--54.
[83]
Mei, Q., Zhou, D., and Church, K. 2008. Query suggestion using hitting time. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08). ACM, New York, NY, 469--478.
[84]
Ozertem, U., Chapelle, O., Donmez, P., and Velipasaoglu, E. 2012. Learning to suggest: A machine learning framework for ranking query suggestions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 25--34.
[85]
Paşca, M. 2007. Organizing and searching the World Wide Web of facts -- step two: Harnessing the wisdom of the crowds. In Proceedings of the 16th International Conference on World Wide Web (WWW'07). ACM, New York, NY, 101--110.
[86]
Paşca, M. and Alfonseca, E. 2009. Web-derived resources for Web information retrieval: From conceptual hierarchies to attribute hierarchies. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). ACM, New York, NY, 596--603.
[87]
Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The pagerank citation ranking: Bringing order to the Web. Tech. rep. 1999--66, Stanford InfoLab.
[88]
Pitkow, J., Schütze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., and Breuel, T. 2002. Personalized search. Comm. ACM 45, 9, 50--55.
[89]
Piwowarski, B., Dupret, G., and Jones, R. 2009. Mining user Web search activity with layered bayesian networks or how to capture a click in its context. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09). ACM, New York, NY, 162--171.
[90]
Poblete, B. and Baeza-Yates, R. 2008. Query-sets: Using implicit feedback and query patterns to organize Web documents. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, NY, 41--50.
[91]
Pretschner, A. and Gauch, S. 1999. Ontology based personalized search. In Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'99). IEEE Computer Society, Los Alamitos, CA, 391.
[92]
Qiu, F. and Cho, J. 2006. Automatic identification of user interest for personalized search. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 727--736.
[93]
Radlinski, F. and Joachims, T. 2005. Query chains: Learning to rank from implicit feedback. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). ACM, New York, NY, 239--248.
[94]
Radlinski, F. and Joachims, T. 2006. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the 21st National Conference on Artificial Intelligence. AAAI Press, 1406--1412.
[95]
Radlinski, F. and Joachims, T. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). ACM, New York, NY, 570--579.
[96]
Rose, D. E. and Levinson, D. 2004. Understanding user goals in Web search. In Proceedings of the 13th International Conference on World Wide Web (WWW'04). ACM, New York, NY, 13--19.
[97]
Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 613--620.
[98]
Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24, 3, 320--352.
[99]
Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for Web query classification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). ACM, New York, NY, 131--138.
[100]
Shen, X., Tan, B., and Zhai, C. 2005a. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). ACM, New York, NY, 43--50.
[101]
Shen, X., Tan, B., and Zhai, C. 2005b. Implicit user modeling for personalized search. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05). ACM, New York, NY, 824--831.
[102]
Shokouhi, M. and Radinsky, K. 2012. Time-sensitive query auto-completion. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'12). ACM, New York, NY, 601--610.
[103]
Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 6--12.
[104]
Silvestri, F. 2010. Mining query logs: Turning search usage data into knowledge. Found. Trends Inf. Retriev. 4, 1--2, 1--174.
[105]
Speretta, M. and Gauch, S. 2005. Personalized search based on user search histories. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'05). IEEE Computer Society, Washington, DC, 622--628.
[106]
Spink, A., Jansen, B. J., Wolfram, D., and Saracevic, T. 2002. From e-sex to e-commerce: Web search changes. Computer 35, 107--109.
[107]
Spink, A., Wolfram, D., Jansen, M. B. J., and Saracevic, T. 2001. Searching the Web: The public and their queries. J. Amer. Soc. Inf. Sci. Technol. 52, 3, 226--234.
[108]
Sun, J.-T., Zeng, H.-J., Liu, H., Lu, Y., and Chen, Z. 2005. Cubesvd: A novel approach to personalized Web search. In Proceedings of the 14th International Conference on World Wide Web. 382--390.
[109]
Szpektor, I., Gionis, A., and Maarek, Y. 2011. Improving recommendation for long-tail queries via templates. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM, New York, NY, 47--56.
[110]
Tan, B., Shen, X., and Zhai, C. 2006. Mining long-term search history to improve search accuracy. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). ACM, New York, NY, 718--723.
[111]
Teevan, J., Adar, E., Jones, R., and Potts, M. A. S. 2007. Information re-retrieval: Repeat queries in Yahoo's logs. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). ACM, New York, NY, 151--158.
[112]
Teevan, J., Dumais, S. T., and Horvitz, E. 2005. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). ACM, New York, NY, 449--456.
[113]
Teevan, J., Dumais, S. T., and Liebling, D. J. 2008. To personalize or not to personalize: Modeling queries with variation in user intent. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, NY, 163--170.
[114]
Tyler, S. K. and Teevan, J. 2010. Large scale query log analysis of re-finding. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 191--200.
[115]
Vlachos, M., Meek, C., Vagena, Z., and Gunopulos, D. 2004. Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). ACM, New York, NY, 131--142.
[116]
Wang, K., Gloy, N., and Li, X. 2010. Inferring search behaviors using partially observable markov (pom) model. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM'10). ACM, New York, NY, 211--220.
[117]
Weber, I. and Jaimes, A. 2011. Who uses Web search for what? And how? In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM, New York, NY, 15--24.
[118]
Wedig, S. and Madani, O. 2006. A large-scale analysis of query logs for assessing personalization opportunities. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). ACM, New York, NY, 742--747.
[119]
Weerkamp, W., Berendsen, R., Kovachev, B., Meij, E., Balog, K., and de Rijke, M. 2011. People searching for people: Analysis of a people search engine log. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). ACM, New York, NY, 45--54.
[120]
Welch, M. J. and Cho, J. 2008. Automatically identifying localizable queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, NY, 507--514.
[121]
Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. 2001. Clustering user queries of a search engine. In Proceedings of the 10th International Conference on World Wide Web (WWW'01). ACM, New York, NY, 162--168.
[122]
White, R. W., Bailey, P., and Chen, L. 2009. Predicting user interests from contextual information. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). ACM, New York, NY, 363--370.
[123]
Wolfram, D., Spink, A., Jansen, B. J., and Saracevic, T. 2001. Vox populi: The public searching of the web. J. Amer. Soci. Inf. Sci. Technol. 52, 12, 1073--1074.
[124]
Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E., and Li, H. 2010. Context-aware ranking in web search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). ACM, New York, NY, 451--458.
[125]
Xu, J. and Xu, G. 2011. Learning similarity function for rare queries. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM, New York, NY, 615--624.
[126]
Xue, G.-R., Zeng, H.-J., Chen, Z., Yu, Y., Ma, W.-Y., Xi, W., and Fan, W. 2004. Optimizing Web search using Web click-through data. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM'04). ACM, New York, NY, 118--126.
[127]
Yi, X., Raghavan, H., and Leggetter, C. 2009. Discovering users' specific geo intention in Web search. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM, New York, NY, 481--490.
[128]
Zhu, G. and Mishne, G. 2009. Mining rich session context to improve Web search. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). ACM, New York, NY, 1037--1046.

Cited By

View all
  • (2024)Exploratory and directed search strategies at a social science data archiveIASSIST Quarterly10.29173/iq108748:1Online publication date: 28-Mar-2024
  • (2024)Understanding user intent modeling for conversational recommender systems: a systematic literature reviewUser Modeling and User-Adapted Interaction10.1007/s11257-024-09398-x34:5(1643-1706)Online publication date: 1-Nov-2024
  • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
  • Show More Cited By

Index Terms

  1. Mining search and browse logs for web search: A Survey

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 4, Issue 4
    Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
    September 2013
    452 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/2508037
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 October 2013
    Accepted: 01 March 2013
    Revised: 01 February 2013
    Received: 01 October 2012
    Published in TIST Volume 4, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Search logs
    2. Web search
    3. browse log
    4. document ranking
    5. document understanding
    6. feedbacks
    7. log mining
    8. monitoring
    9. query understanding
    10. survey
    11. user understanding

    Qualifiers

    • Research-article
    • Survey
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploratory and directed search strategies at a social science data archiveIASSIST Quarterly10.29173/iq108748:1Online publication date: 28-Mar-2024
    • (2024)Understanding user intent modeling for conversational recommender systems: a systematic literature reviewUser Modeling and User-Adapted Interaction10.1007/s11257-024-09398-x34:5(1643-1706)Online publication date: 1-Nov-2024
    • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
    • (2023)You are how (and where) you search? Comparative analysis of web search behavior using web tracking dataJournal of Computational Social Science10.1007/s42001-023-00208-96:2(741-756)Online publication date: 3-May-2023
    • (2022)Analysis of Information Search around the Time of Childbirth: Estimating Probability Distributions of Search Dates via Mathematical Optimization出産前後の情報検索の分析:数理最適化による検索日の確率分布推定Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.37-3_D-L7437:3(D-L74_1-11)Online publication date: 1-May-2022
    • (2022)A comparison of dataset search behaviour of internal versus search engine referred sessionsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505821(158-168)Online publication date: 14-Mar-2022
    • (2022)SoK: Cryptanalysis of Encrypted Search with LEAKER – A framework for LEakage AttacK Evaluation on Real-world data2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP53844.2022.00014(90-108)Online publication date: Jun-2022
    • (2021)ConCaT: Construction of Category Trees from Search Queries in E-Commerce2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00308(2701-2704)Online publication date: Apr-2021
    • (2020)Personalized product search based on user transaction history and hypergraph learningMultimedia Tools and Applications10.1007/s11042-020-08963-xOnline publication date: 17-May-2020
    • (2020)Identifying User’s Interest in Using E-Payment SystemsInnovations in Computer Science and Engineering10.1007/978-981-15-2043-3_40(353-361)Online publication date: 4-Mar-2020
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media