article

Mining Query Logs: Turning Search Usage Data into Knowledge

Author:

Fabrizio SilvestriAuthors Info & Claims

Foundations and Trends in Information Retrieval, Volume 4, Issue 1—2

Pages 1 - 174

https://doi.org/10.1561/1500000013

Published: 01 January 2010 Publication History

Abstract

Web search engines have stored in their logs information about users since they started to operate. This information often serves many purposes. The primary focus of this survey is on introducing to the discipline of query mining by showing its foundations and by analyzing the basic algorithms and techniques that are used to extract useful knowledge from this (potentially) infinite source of information. We show how search applications may benefit from this kind of analysis by analyzing popular applications of query log mining and their influence on user experience. We conclude the paper by, briefly, presenting some of the most challenging current open problems in this field.

References

[1]

E. Adar, "User 4xxxxx9: Anonymizing query logs," in Query Log Analysis: Social And Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007), (E. Amitay, C. G. Murray, and J. Teevan, eds.), May 2007.

[2]

E. Adar, D. S. Weld, B. N. Bershad, and S. S. Gribble, "Why we search: Visualizing and predicting user behavior," in WWW '07: Proceedings of the 16th International Conference on World Wide Web, pp. 161-170, New York, NY, USA: ACM, 2007.

[3]

A. Agarwal and S. Chakrabarti, "Learning random walks to rank nodes in graphs," in ICML '07: Proceedings of the 24th International Conference on Machine Learning, pp. 9-16, New York, NY, USA: ACM, 2007.

[4]

E. Agichtein, E. Brill, and S. Dumais, "Improving web search ranking by incorporating user behavior information," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19-26, New York, NY, USA: ACM, 2006.

[5]

E. Agichtein, E. Brill, S. Dumais, and R. Ragno, "Learning user interaction models for predicting web search result preferences," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3-10, New York, NY, USA: ACM, 2006.

[6]

E. Agichtein and Z. Zheng, "Identifying "best bet" web search results by mining past user behavior," in KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 902- 908, New York, NY, USA: ACM, 2006.

[7]

R. Agrawal, T. Imielinski, and A. N. Swami, "Mining association rules between sets of items in large databases," in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26-28, 1993, (P. Buneman and S. Jajodia, eds.), pp. 207-216, ACM Press, 1993.

[8]

F. Ahmad and G. Kondrak, "Learning a spelling error model from search query logs," in Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 955-962, Vancouver, Canada: Association for Computational Linguistic, October 2005.

[9]

C. Anderson, The Long Tail. Random House Business, 2006.

[10]

A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan, "Searching the web," ACM Transactions on Internet Technology, vol. 1, no. 1, pp. 2-43, 2001.

Digital Library

[11]

V. Authors, "About web analytics association," Retrieved on August 2009. http://www.webanalyticsassociation.org/aboutus/.

[12]

R. Baeza-Yates, Web Mining: Applications and Techniques. ch. Query Usage Mining in Search Engines, pp. 307-321, Idea Group, 2004.

[13]

R. Baeza-Yates, "Algorithmic challenges in web search engines," in Proceedings of the 7th Latin American Symposium on Theoretical Informatics (LATIN'06), pp. 1-7, Valdivia, Chile, 2006.

[14]

R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri, "Challenges in distributed information retrieval," in International Conference on Data Engineering (ICDE), Istanbul, Turkey: IEEE CS Press, April 2007.

[15]

R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, "The impact of caching on search engines," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 183-190, New York, NY, USA: ACM, 2007.

[16]

R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, "Design trade-offs for search engine caching," ACM Transactions on the Web, vol. 2, no. 4, pp. 1-28, 2008.

Digital Library

[17]

R. Baeza-Yates, C. Hurtado, and M. Mendoza, Query Recommendation Using Query Logs in Search Engines. pp. 588-596. Vol. 3268/2004 of Lecture Notes in Computer Science, Berlin, Heidelberg: Springer, November 2004.

[18]

R. Baeza-Yates, C. Hurtado, and M. Mendoza, "Ranking boosting based in query clustering," in Proceedings of 2004 Atlantic Web Intelligence Conference, Cancun, Mexico, 2004.

[19]

R. Baeza-Yates and A. Tiberi, "Extracting semantic relations from query logs," in KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 76-85, New York, NY, USA: ACM, 2007.

[20]

R. A. Baeza-Yates, "Applications of web query mining," in Advances in Information Retrieval, 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain, March 21-23, 2005, Proceedings, (D. E. Losada and J. M. Fernández-Luna, eds.), pp. 7-22, Springer, 2005.

[21]

R. A. Baeza-Yates, "Graphs from search engine queries," in SOFSEM 2007: Theory and Practice of Computer Science, 33rd Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 20-26, 2007, Proceedings, (J. van Leeuwen, G. F. Italiano, W. van der Hoek, C. Meinel, H. Sack, and F. Plasil, eds.), pp. 1-8, Springer, 2007.

[22]

R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza, "Improving search engines by query clustering," JASIST, vol. 58, no. 12, pp. 1793-1804, 2007.

[23]

R. A. Baeza-Yates, C. A. Hurtado, M. Mendoza, and G. Dupret, "Modeling user search behavior," in Third Latin American Web Congress (LAWeb 2005), 1 October - 2 November 2005, Buenos Aires, Argentina, pp. 242-251, IEEE Computer Society, 2005.

[24]

R. A. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel, "Admission policies for caches of search engine results," in SPIRE, pp. 74-85, 2007.

[25]

R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999.

Digital Library

[26]

R. A. Baeza-Yates and F. Saint-Jean, "A three level search engine index based in query log distribution," in SPIRE, pp. 56-65, 2003.

[27]

J. Bar-Ilan, "Access to query logs -- an academic researcher's point of view," in Query Log Analysis: Social And Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007), (E. Amitay, C. G. Murray, and J. Teevan, eds.), May 2007.

[28]

Z. Bar-Yossef and M. Gurevich, "Mining search engine query logs via suggestion sampling," Proceedings of the VLDB Endowment, vol. 1, no. 1, pp. 54-65, 2008.

Digital Library

[29]

R. Baraglia, F. Cacheda, V. Carneiro, F. Diego, V. Formoso, R. Perego, and F. Silvestri, "Search shortcuts: A new approach to the recommendation of queries," in RecSys '09: Proceedings of the 2009 ACM Conference on Recommender Systems, New York, NY, USA: ACM, 2009.

[30]

R. Baraglia, F. Cacheda, V. Carneiro, V. Formoso, R. Perego, and F. Silvestri, "Search shortcuts: Driving users towards their goals," in WWW '09: Proceedings of the 18th International Conference on World Wide Web, pp. 1073-1074, New York, NY, USA: ACM, 2009.

[31]

R. Baraglia, F. Cacheda, V. Carneiro, V. Formoso, R. Perego, and F. Silvestri, "Search shortcuts using click-through data," in WSCD '09: Proceedings of the 2009 Workshop on Web Search Click Data, pp. 48-55, New York, NY, USA: ACM, 2009.

[32]

R. Baraglia and F. Silvestri, "Dynamic personalization of web sites without user intervention," Communications of the ACM, vol. 50, no. 2, pp. 63-67, 2007.

Digital Library

[33]

L. A. Barroso, J. Dean, and U. Hölzle, "Web search for a planet: The google cluster architecture," IEEE Micro, vol. 23, no. 2, pp. 22-28, 2003.

Digital Library

[34]

S. M. Beitzel, E. C. Jensen, A. Chowdhury, O. Frieder, and D. Grossman, "Temporal analysis of a very large topically categorized web query log," Journal of the American Society for Information Science and Technology, vol. 58, no. 2, pp. 166-178, 2007.

Digital Library

[35]

S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder, "Hourly analysis of a very large topically categorized web query log," in SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 321-328, New York, NY, USA: ACM, 2004.

[36]

S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz, "Improving automatic query classification via semi-supervised learning," in ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 42-49, Washington, DC, USA: IEEE Computer Society, 2005.

[37]

S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, and O. Frieder, "Automatic classification of web queries using very large unlabeled query logs," ACM Transactions on Information Systems, vol. 25, no. 2, p. 9, 2007.

Digital Library

[38]

L. A. Belady, "A study of replacement algorithms for a virtual storage computer," IBM Systems Journal, vol. 5, no. 2, pp. 78-101, 1966.

Digital Library

[39]

R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.

[40]

"Beowulf Project at CESDIS," http://www.beowulf.org.

[41]

M. Bilenko and R. W. White, "Mining the search trails of surfing crowds: Identifying relevant websites from user activity," in WWW '08: Proceeding of the 17th International Conference on World Wide Web, pp. 51-60, New York, NY, USA: ACM, 2008.

[42]

B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel, "Query expansion using associated queries," in Proceedings of the twelfth international conference on information and knowledge management, pp. 2-9, ACM Press, 2003.

[43]

P. Boldi and S. Vigna, "The webgraph framework i: Compression techniques," in WWW '04: Proceedings of the 13th International Conference on World Wide Web, pp. 595-602, New York, NY, USA: ACM Press, 2004.

[44]

J. Boyan, D. Freitag, and T. Joachims, "A machine learning architecture for optimizing web search engines," in Proceedings of the AAAI Workshop on Internet-Based Information Systems, 1996.

[45]

O. Boydell and B. Smyth, "Capturing community search expertise for personalized web search using snippet-indexes," in CIKM '06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 277-286, New York, NY, USA: ACM, 2006.

[46]

J. S. Breese, D. Heckerman, and C. M. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in UAI, pp. 43-52, 1998.

Digital Library

[47]

S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," in WWW7: Proceedings of the Seventh International Conference on World Wide Web 7, pp. 107-117, Amsterdam, The Netherlands: Elsevier Science Publishers B.V., 1998.

[48]

A. Z. Broder, "A taxonomy of web search," SIGIR Forum, vol. 36, no. 2, pp. 3-10, 2002.

Digital Library

[49]

A. Z. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang, "Robust classification of rare queries using web knowledge," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231- 238, New York, NY, USA: ACM, 2007.

[50]

A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, "Syntactic Clustering of the Web," in Selected Papers from the Sixth International Conference on World Wide Web, pp. 1157-1166, Essex, UK: Elsevier Science Publishers Ltd., 1997.

[51]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to rank using gradient descent," in ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pp. 89-96, New York, NY, USA: ACM, 2005.

[52]

C. J. C. Burges, R. Ragno, and Q. V. Le, "Learning to rank with nonsmooth cost functions.," in NIPS, (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 193-200, MIT Press, 2006.

[53]

R. Buyya, ed., High Performance Cluster Computing. Prentice Hall PTR, 1999.

[54]

H. C. by Thomas, E. L. Charles, L. R. Ronald, and S. Clifford, Introduction to Algorithms. The MIT Press, 2001.

[55]

J. Callan and M. Connell, "Query-based sampling of text databases," ACM Transactions on Information Systems, vol. 19, no. 2, pp. 97-130, 2001.

Digital Library

[56]

J. P. Callan, Z. Lu, and W. B. Croft, "Searching distributed collections with inference networks," in SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21-28, New York, NY, USA: ACM, 1995.

[57]

C. Castillo, "Effective web crawling," PhD thesis, Department of Computer Science -- University of Chile, Santiago, Chile, November 2004.

[58]

J. Caverlee, L. Liu, and J. Bae, "Distributed query sampling: A quality-conscious approach," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 340-347, New York, NY, USA: ACM, 2006.

[59]

D. Chakrabarti, R. Kumar, and A. Tomkins, "Evolutionary clustering," in KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 554-560, New York, NY, USA: ACM, 2006.

[60]

Q. Chen, M. Li, and M. Zhou, "Improving query spelling correction using web search results," in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 181-189, Prague, Czech Republic: Association for Computational Linguistic, June 2007.

[61]

F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, and E. Upfal, "Finding near neighbors through cluster pruning," in Proceedings of ACM SIGMOD/PODS 2007 Conference, 2007.

[62]

P. A. Chirita, C. S. Firan, and W. Nejdl, "Personalized query expansion for the web," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 7-14, New York, NY, USA: ACM, 2007.

[63]

A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe, "Collection statistics for fast duplicate document detection," ACM Transactions on Information Systems, vol. 20, no. 2, pp. 171-191, 2002.

Digital Library

[64]

A. Cooper, "A survey of query log privacy-enhancing techniques from a policy perspective," ACM Transactions on the Web, vol. 2, no. 4, pp. 1-27, 2008.

Digital Library

[65]

N. Craswell, P. Bailey, and D. Hawking, "Server selection on the world wide web," in DL '00: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 37-46, New York, NY, USA: ACM, 2000.

[66]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey, "An experimental comparison of click position-bias models," in WSDM '08: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 87-94, New York, NY, USA: ACM, 2008.

[67]

S. Cucerzan and E. Brill, "Spelling correction as an iterative process that exploits the collective knowledge of web users," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pp. 293-300, July 2004.

[68]

S. Cucerzan and R. W. White, "Query suggestion based on user landing pages," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 875-876, New York, NY, USA: ACM Press, 2007.

[69]

H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, "Probabilistic query expansion using query logs," in WWW '02: Proceedings of the 11th International Conference on World Wide Web, pp. 325-332, New York, NY, USA: ACM, 2002.

[70]

E. Cutrell and Z. Guan, "What are you looking for? An eye-tracking study of information usage in web search," in CHI '07: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 407-416, New York, NY, USA: ACM, 2007.

[71]

F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of the ACM, vol. 7, no. 3, pp. 171-176, 1964.

Digital Library

[72]

I. S. Dhillon, S. Mallela, and D. S. Modha, "Information-theoretic co-clustering," in Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 89-98, 2003.

Digital Library

[73]

Z. Dou, R. Song, and J. Wen, "A large-scale evaluation and analysis of personalized search strategies," in Proceedings of the 16th International World Wide Web Conference (WWW2007), pp. 572-581, May 2007.

[74]

T. Fagni, R. Perego, F. Silvestri, and S. Orlando, "Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data," ACM Transactions on Information Systems, vol. 24, no. 1, pp. 51-78, 2006.

Digital Library

[75]

C. H. Fenichel, "Online searching: Measures that discriminate among users with different types of experience," JASIS, vol. 32, no. 1, pp. 23-32, 1981.

[76]

P. Ferragina and A. Gulli, "A personalized search engine based on web-snippet hierarchical clustering," in WWW '05: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 801-810, New York, NY, USA: ACM, 2005.

[77]

L. Fitzpatrick and M. Dent, "Automatic feedback using past queries: social searching?," in SIGIR '97: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 306-313, New York, NY, USA: ACM, 1997.

[78]

B. M. Fonseca, P. B. Golgher, E. S. de Moura, and N. Ziviani, "Using association rules to discover search engines related queries," in LA-WEB '03: Proceedings of the First Conference on Latin American Web Congress, p. 66, Washington, DC, USA: IEEE Computer Society, 2003.

[79]

I. Foster and C. Kesselman, eds., The Grid: Blueprint for a Future Computing Infrastructure. Morgan-Kaufmann, 1999.

[80]

S. T. I. Foster and C. Kesselman, "The anatomy of the grid: Enabling scalable virtual organization," Int'l Journal on Supercomputer Application, vol. 3, no. 15.

[81]

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, "An efficient boosting algorithm for combining preferences," Journal of Machine Learning Research, vol. 4, pp. 933-969, 2003.

Digital Library

[82]

N. Fuhr, "Optimal polynomial retrieval functions based on the probability ranking principle," ACM Transactions on Information Systems, vol. 7, no. 3, pp. 183-204, 1989.

Digital Library

[83]

N. Fuhr, "A decision-theoretic approach to database selection in networked ir," ACM Transactions on Information Systems, vol. 17, no. 3, pp. 229-249, 1999.

Digital Library

[84]

N. Fuhr, S. Hartmann, G. Knorz, G. Lustig, M. Schwantner, and K. Tzeras, "AIR/X--a rule-based multistage indexing system for large subject fields," in Proceedings of the RIAO'91, Barcelona, Spain, April 2-5, 1991, pp. 606-623, 1991.

[85]

G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu, "Parameter free bursty events detection in text streams," in VLDB '05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 181-192, VLDB Endowment, 2005.

[86]

G. W. Furnas, S. C. Deerwester, S. T. Dumais, T. K. Landauer, R. A. Harshman, L. A. Streeter, and K. E. Lochbaum, "Information retrieval using a singular value decomposition model of latent semantic structure," in SIGIR, pp. 465-480, 1988.

[87]

G. Galilei, "Discorsi e dimostrazioni matematiche intorno a due nuove scienze," Leida : Appresso gli Elsevirii, 1638.

[88]

L. A. Granka, T. Joachims, and G. Gay, "Eye-tracking analysis of user behavior in www search," in SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information retrieval, pp. 478-479, New York, NY, USA: ACM, 2004.

[89]

L. Gravano, H. Garcia-Molina, and A. Tomasic, "The efficacy of gloss for the text database discovery problem," Technical Report, Stanford University, Stanford, CA, USA, 1993.

Digital Library

[90]

L. Gravano, H. García-Molina, and A. Tomasic, "The effectiveness of gioss for the text database discovery problem," in SIGMOD '94: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 126-137, New York, NY, USA: ACM, 1994.

[91]

L. Gravano, H. García-Molina, and A. Tomasic, "Gloss: text-source discovery over the internet," ACM Transactions on Database Systems, vol. 24, no. 2, pp. 229-264, 1999.

Digital Library

[92]

L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein, "Categorizing web queries according to geographical locality," in CIKM '03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 325-333, New York, NY, USA: ACM, 2003.

[93]

Z. Guan and E. Cutrell, "An eye tracking study of the effect of target rank on web search," in CHI '07: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 417-420, New York, NY, USA: ACM, 2007.

[94]

T. H. Haveliwala, "Topic-sensitive pagerank," in WWW '02: Proceedings of the 11th International Conference on World Wide Web, pp. 517-526, New York, NY, USA: ACM, 2002.

Digital Library

[95]

D. Hawking, "Overview of the trec-9 web track," in TREC, 2000.

[96]

D. Hawking, "Web search engines: Part 1," Computer, vol. 39, no. 6, pp. 86-88, 2006.

Digital Library

[97]

D. Hawking, "Web search engines: Part 2," Computer, vol. 39, no. 8, pp. 88-90, 2006.

Digital Library

[98]

D. Hawking and P. Thistlewaite, "Methods for information server selection," ACM Transactions on Information Systems, vol. 17, no. 1, pp. 40-76, 1999.

Digital Library

[99]

J. Hennessy and D. Patterson, Computer Architecture -- A Quantitative Approach. Morgan Kaufmann, 2003.

Digital Library

[100]

M. R. Henzinger, "Algorithmic challenges in web search engines," Internet Mathematics, vol. 1, no. 1, 2003.

[101]

M. R. Henzinger, R. Motwani, and C. Silverstein, "Challenges in web search engines," SIGIR Forum, vol. 36, no. 2, pp. 11-22, 2002.

Digital Library

[102]

T. C. Hoad and J. Zobel, "Methods for identifying versioned and plagiarized documents," Journal of the American Society for Information Science and Technology, vol. 54, no. 3, pp. 203-215, 2003.

Digital Library

[103]

I. Hsieh-Yee, "Effects of search experience and subject knowledge on the search tactics of novice and experienced searchers," JASIS, vol. 44, no. 3, pp. 161-174, 1993.

[104]

A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: A review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.

Digital Library

[105]

B. J. Jansen and M. Resnick, "An examination of searcher's perceptions of nonsponsored and sponsored links during ecommerce web searching," Journal of the American Society for Information Science and Technology, vol. 57, no. 14, pp. 1949-1961, 2006.

Digital Library

[106]

B. J. Jansen and A. Spink, "An analysis of web searching by european alltheweb.com users," Information Processing and Management, vol. 41, no. 2, pp. 361-381, 2005.

Digital Library

[107]

B. J. Jansen and A. Spink, "How are we searching the world wide web? A comparison of nine search engine transaction logs," Information Processing and Management, vol. 42, no. 1, pp. 248-263, 2006.

Digital Library

[108]

B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic, "Real life information retrieval: A study of user queries on the web," SIGIR Forum, vol. 32, no. 1, pp. 5-17, 1998.

Digital Library

[109]

B. J. Jansen, A. Spink, and S. Koshman, "Web searcher interaction with the dogpile.com metasearch engine," JASIST, vol. 58, no. 5, pp. 744-755, 2007.

[110]

B. J. J. Jansen, "Understanding user-web interactions via web analytics," Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 1, no. 1, pp. 1-102, 2009.

Digital Library

[111]

T. Joachims, "Optimizing search engines using clickthrough data," in KDD '02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, New York, NY, USA: ACM Press, 2002.

[112]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay, "Accurately interpreting clickthrough data as implicit feedback," in SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 154-161, New York, NY, USA: ACM, 2005.

[113]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, "Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search," ACM Transactions on Information Systems, vol. 25, no. 2, p. 7, 2007.

Digital Library

[114]

T. Joachims, H. Li, T.-Y. Liu, and C. Zhai, "Learning to rank for information retrieval (lr4ir 2007)," SIGIR Forum, vol. 41, no. 2, pp. 58-62, 2007.

Digital Library

[115]

T. Joachims and F. Radlinski, "Search engines that learn from implicit feedback," Computer, vol. 40, no. 8, pp. 34-40, 2007.

Digital Library

[116]

K. S. Jones, S. Walker, and S. E. Robertson, "A probabilistic model of information retrieval: Development and comparative experiments," Information Processing and Management, vol. 36, no. 6, pp. 779-808, 2000.

Digital Library

[117]

R. Jones, R. Kumar, B. Pang, and A. Tomkins, ""I know what you did last summer": Query logs and user privacy," in CIKM '07: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 909-914, New York, NY, USA: ACM, 2007.

[118]

R. Jones, B. Rey, O. Madani, and W. Greiner, "Generating query substitutions," in WWW '06: Proceedings of the 15th International Conference on World Wide Web, pp. 387-396, New York, NY, USA: ACM Press, 2006.

[119]

R. Karedla, J. S. Love, and B. G. Wherry, "Caching strategies to improve disk system performance," Computer, vol. 27, no. 3, pp. 38-46, 1994.

Digital Library

[120]

M. Kendall, Rank Correlation Methods. Hafner, 1955.

[121]

J. Kleinberg, "Bursty and hierarchical structure in streams," in KDD '02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91-101, New York, NY, USA: ACM, 2002.

[122]

J. M. Kleinberg, "Authoritative sources in a hyperlinked environment," Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.

Digital Library

[123]

S. Koshman, A. Spink, and B. J. Jansen, "Web searching on the vivisimo search engine," JASIST, vol. 57, no. 14, pp. 1875-1887, 2006.

Digital Library

[124]

M. Koster, "Aliweb: Archie-like indexing in the web," Computer Networks and ISDN Systems, vol. 27, no. 2, pp. 175-182, 1994.

Digital Library

[125]

S. Kullback and R. A. Leibler, "On information and sufficiency," Annals of Mathematical Statistics, vol. 22, pp. 49-86, 1951.

[126]

R. Kumar, J. Novak, B. Pang, and A. Tomkins, "On anonymizing query logs via token-based hashing," in WWW '07: Proceedings of the 16th International Conference on World Wide Web, pp. 629-638, New York, NY, USA: ACM, 2007.

[127]

T. Lau and E. Horvitz, "Patterns of search: analyzing and modeling web query refinement," in UM '99: Proceedings of the Seventh International Conference on User Modeling, pp. 119-128, Secaucus, NJ, USA: Springer-Verlag New York, Inc., 1999.

[128]

U. Lee, Z. Liu, and J. Cho, "Automatic identification of user goals in web search," in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 391-400, New York, NY, USA: ACM, 2005.

[129]

R. Lempel and S. Moran, "Predictive caching and prefetching of query results in search engines," in WWW '03: Proceedings of the 12th International Conference on World Wide Web, pp. 19-28, New York, NY, USA: ACM, 2003.

[130]

R. Lempel and S. Moran, "Competitive caching of query results in search engines," Theoretical Computer Science, vol. 324, no. 2-3, pp. 253-271, 2004.

Digital Library

[131]

R. Lempel and S. Moran, "Optimizing result prefetching in web search engines with segmented indices," ACM Transactions on Internet Technology, vol. 4, no. 1, pp. 31-59, 2004.

Digital Library

[132]

R. Lempel and F. Silvestri, "Web search result caching and prefetching," Encyclopedia of Database Systems, Springer Verlag, 2008.

[133]

M. Li, Y. Zhang, M. Zhu, and M. Zhou, "Exploring distributional similarity based models for query spelling correction," in ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1025- 1032, Morristown, NJ, USA: Association for Computational Linguistics, 2006.

[134]

Y. Li, Z. Zheng, and H. K. Dai, "Kdd cup-2005 report: facing a great challenge," SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 91-99, 2005.

Digital Library

[135]

F. Liu, C. Yu, and W. Meng, "Personalized web search by mapping user queries to categories," in CIKM '02: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 558-565, New York, NY, USA: ACM Press, 2002.

[136]

Live Search Team at Microsoft, "Local, relevance, and japan!," http://blogs.msdn.com/livesearch/archive/2005/06/21/431288.aspx, 2005.

[137]

X. Long and T. Suel, "Three-level caching for efficient query processing in large web search engines," in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 257-266, New York, NY, USA: ACM, 2005.

[138]

R. M. Losee and L. C. Jr., "Information retrieval with distributed databases: Analytic models of performance," IEEE Transactions on Parallel & Distributed Systems, vol. 15, no. 1, pp. 18-27, 2004.

Digital Library

[139]

C. Lucchese, S. Orlando, R. Perego, and F. Silvestri, "Mining query logs to optimize index partitioning in parallel web search engines," in InfoScale '07: Proceedings of the 2nd International Conference on Scalable Information Systems, New York, NY, USA: ACM, 2007.

[140]

T.-Y. Lui, "Learning to rank for information retrieval," Foundations and Trends in Information Retrieval, vol. 3, no. 3, 2008.

[141]

Y. Lv, L. Sun, J. Zhang, J.-Y. Nie, W. Chen, and W. Zhang, "An iterative implicit feedback approach to personalized search," in ACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 585-592, Morristown, NJ, USA: Association for Computational Linguistics, 2006.

[142]

C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.

Digital Library

[143]

M. Marchiori, "The quest for correct information on the web: Hyper search engines," Computer Networks, vol. 29, no. 8-13, pp. 1225-1236, 1997.

Digital Library

[144]

E. P. Markatos, "On caching search engine query results," Computer Communications, vol. 24, pp. 137-143, 1 February 2000.

Digital Library

[145]

M. Mat-Hassan and M. Levene, "Associating search and navigation behavior through log analysis: Research articles," Journal of the American Society for Information Science and Technology, vol. 56, no. 9, pp. 913-934, 2005.

Digital Library

[146]

O. A. McBryan, "Genvl and wwww: Tools for taming the web," in Proceedings of the First International World Wide Web Conference, (O. Nierstarsz, ed.), p. 15, CERN, Geneva, 1994.

[147]

S. Melink, S. Raghavan, B. Yang, and H. Garcia-Molina, "Building a distributed full-text index for the web," ACM Transactions on Information Systems, vol. 19, no. 3, pp. 217-241, 2001.

Digital Library

[148]

T. Mitchell, Machine Learning. McGraw-Hill International Editions, 1997.

[149]

A. Moffat, W. Webber, and J. Zobel, "Load balancing for term-distributed parallel retrieval," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 348-355, New York, NY, USA: ACM, 2006.

[150]

A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates, "A pipelined architecture for distributed text query evaluation," Information Retrieval, vol. 10, no. 3, pp. 205-231, 2007.

Digital Library

[151]

A. Moffat and J. Zobel, "Information retrieval systems for large document collections," in TREC, 1994.

[152]

E. J. O'Neil, P. E. O'Neil, and G. Weikum, "An optimality proof of the lruk page replacement algorithm," Journal of the ACM, vol. 46, no. 1, pp. 92-112, 1999.

Digital Library

[153]

S. Orlando, R. Perego, and F. Silvestri, "Design of a parallel and distributed WEB search engine," in Proceedings of Parallel Computing (ParCo) 2001 conference, Imperial College Press, September 2001.

[154]

H. C. Ozmutlu, A. Spink, and S. Ozmutlu, "Analysis of large data logs: An application of poisson sampling on excite web queries," Information Processing and Management, vol. 38, no. 4, pp. 473-490, 2002.

Digital Library

[155]

S. Ozmutlu, H. C. Ozmutlu, and A. Spink, "Multitasking web searching and implications for design," JASIST, vol. 40, no. 1, pp. 416-421, 2003.

[156]

S. Ozmutlu, A. Spink, and H. C. Ozmutlu, "A day in the life of web searching: An exploratory study," Information Processing and Management, vol. 40, no. 2, pp. 319-345, 2004.

Digital Library

[157]

L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Technical Report, Stanford Digital Library Technologies Project, 1998.

[158]

S. Pandey and C. Olston, "User-centric web crawling," in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 401-411, New York, NY, USA: ACM, 2005.

[159]

S. Pandey and C. Olston, "Crawl ordering by search impact," in WSDM '08: Proceedings of the international conference on Web search and web data mining, pp. 3-14, New York, NY, USA: ACM, 2008.

[160]

G. Pass, A. Chowdhury, and C. Torgeson, "A picture of search," in InfoScale '06: Proceedings of the First International Conference on Scalable Information Systems, p. 1, New York, NY, USA: ACM, 2006.

[161]

"Pew research center for the people & the press," WWW page, 2007. http://people-press.org/.

[162]

J. Piskorski and M. Sydow, "String distance metrics for reference matching and search query correction," in Business Information Systems, 10th International Conference, BIS 2007, Poznan, Poland, April 2007, (W. Abramowicz, ed.), pp. 356-368, Springer-Verlag, 2007.

[163]

J. Pitkow, H. Schütze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T. Breuel, "Personalized search," Communications of the ACM, vol. 45, no. 9, pp. 50-55, 2002.

Digital Library

[164]

B. Poblete, M. Spiliopoulou, and R. Baeza-Yates, "Website privacy preservation for query log publishing," in First International Workshop on Privacy, Security, and Trust in KDD (PINKDD'07), August 2007.

[165]

S. Podlipnig and L. Böszörmenyi, "A survey of web cache replacement strategies," ACM Computing Surveys, vol. 35, no. 4, pp. 374-398, 2003.

Digital Library

[166]

A. L. Powell and J. C. French, "Comparing the performance of collection selection algorithms," ACM Transactions on Information Systems, vol. 21, no. 4, pp. 412-456, 2003.

Digital Library

[167]

A. L. Powell, J. C. French, J. Callan, M. Connell, and C. L. Viles, "The impact of database selection on distributed searching," in SIGIR '00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232-239, New York, NY, USA: ACM, 2000.

[168]

W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C. Cambridge University Press, Second ed., 1992.

[169]

D. Puppin, "A search engine architecture based on collection selection," PhD thesis, Dipartimento di Informatica, Università di Pisa, Pisa, Italy, December 2007.

[170]

D. Puppin and F. Silvestri, "The query-vector document model," in CIKM '06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 880-881, New York, NY, USA: ACM, 2006.

[171]

D. Puppin, F. Silvestri, and D. Laforenza, "Query-driven document partitioning and collection selection," in InfoScale '06: Proceedings of the First International Conference on Scalable Information Systems, p. 34, New York, NY, USA: ACM, 2006.

[172]

D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates, "Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load," ACM Transactions on Information Systems.

[173]

D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates, "Load-balancing and caching for collection selection architectures," in InfoScale '07: Proceedings of the 2nd International Conference on Scalable Information Systems, New York, NY, USA: ACM, 2007.

[174]

F. Qiu and J. Cho, "Automatic Identification of User Interest for Personalized Search," in WWW '06: Proceedings of the 15th International Conference on World Wide Web, pp. 727-736, New York, NY, USA: ACM, 2006.

[175]

F. Radlinski and T. Joachims, "Query chains: learning to rank from implicit feedback," in KDD '05: Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 239-248, New York, NY, USA: ACM Press, 2005.

[176]

F. Radlinski and T. Joachims, "Active exploration for learning rankings from clickthrough data," in KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 570-579, New York, NY, USA: ACM, 2007.

[177]

K. H. Randall, R. Stata, J. L. Wiener, and R. G. Wickremesinghe, "The link database: Fast access to graphs of the web," in DCC '02: Proceedings of the Data Compression Conference (DCC '02), p. 122, Washington, DC, USA: IEEE Computer Society, 2002.

[178]

S. E. Robertson and S. Walker, "Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval," in SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232-241, New York, NY, USA: Springer-Verlag New York, Inc., 1994.

[179]

S. E. Robertson and S. Walker, "Okapi/keenbow at trec-8," in TREC, 1999.

[180]

J. T. Robinson and M. V. Devarakonda, "Data cache management using frequency-based replacement," SIGMETRICS Performance Evaluation Review, vol. 18, no. 1, pp. 134-142, 1990.

[181]

J. Rocchio, Relevance Feedback in Information Retrieval. Prentice-Hall, 1971.

[182]

G. Salton and C. Buckley, "Parallel text search methods," Communications of the ACM, vol. 31, no. 2, pp. 202-215, 1988.

Digital Library

[183]

G. Salton and C. Buckley, "Improving retrieval performance by relevance feedback," JASIS, vol. 41, no. 4, pp. 288-297, 1990.

[184]

G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY, USA: McGraw-Hill, Inc., 1986.

Digital Library

[185]

M. Sanderson and S. T. Dumais, "Examining repetition in user search behavior," in ECIR, pp. 597-604, 2007.

[186]

P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Ribeiro-Neto, "Rank-preserving two-level caching for scalable search engines," in SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 51-58, New York, NY, USA: ACM, 2001.

[187]

F. Scholer, H. E. Williams, and A. Turpin, "Query association surrogates for web search: Research articles," Journal of the American Society for Information Science and Technology, vol. 55, no. 7, pp. 637-650, 2004.

Digital Library

[188]

"Search engine use shoots up in the past year and edges towards email as the primary internet application," WWW page, 2005. http://www.pewinternet. org/pdfs/PIP_SearchData_1105.pdf.

[189]

"Search engine users," WWW page, 2005. http://www.pewinternet.org/ pdfs/PIP_Searchengine_users.pdf.

[190]

"Search engine users," White paper, 2005. http://www.enquiroresearch.com/ personalization/.

[191]

F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.

Digital Library

[192]

D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang, "Q²c@ust: Our winning solution to query classification in kddcup 2005," SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 100-110, 2005.

Digital Library

[193]

X. Shen, B. Tan, and C. Zhai, "Ucair: A personalized search toolbar," in SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 681-681, New York, NY, USA: ACM, 2005.

[194]

M. Shokouhi, J. Zobel, and Y. Bernstein, "Distributed text retrieval from overlapping collections," in ADC '07: Proceedings of the Eighteenth Conference on Australasian Database, pp. 141-150, Darlinghurst, Australia: Australian Computer Society, Inc., 2007.

[195]

M. Shokouhi, J. Zobel, S. Tahaghoghi, and F. Scholer, "Using query logs to establish vocabularies in distributed information retrieval," Information Processing and Management, vol. 43, no. 1, pp. 169-180, 2007.

Digital Library

[196]

L. Si and J. Callan, "Using sampled data and regression to merge search engine results," in SIGIR '02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19-26, New York, NY, USA: ACM, 2002.

[197]

L. Si and J. Callan, "Relevant document distribution estimation method for resource selection," in SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 298-305, New York, NY, USA: ACM, 2003.

[198]

S. Siegfried, M. J. Bates, and D. N. Wilde, "A profile of end-user searching behavior by humanities scholars: The getty online searching project report no. 2," JASIS, vol. 44, no. 5, pp. 273-291, 1993.

[199]

C. Silverstein, M. Henzinger, H. Marais, and M. Moricz, "Analysis of a very large altavista query log," Technical Report, Systems Research Center -- 130 Lytton Avenue -- Palo Alto, California 94301, 1998.

[200]

C. Silverstein, H. Marais, M. Henzinger, and M. Moricz, "Analysis of a very large web search engine query log," SIGIR Forum, vol. 33, no. 1, pp. 6-12, 1999.

Digital Library

[201]

F. Silvestri, "High performance issues in web search engines: Algorithms and techniques," PhD thesis, Dipartimento di Informatica, Università di Pisa, Pisa, Italy, May 2004.

[202]

F. Silvestri, "Sorting out the document identifier assignment problem," in Proceedings of the 29th European Conference on Information Retrieval, April 2007.

[203]

F. Silvestri, S. Orlando, and R. Perego, "Assigning identifiers to documents to enhance the clustering property of fulltext indexes," in SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 305-312, New York, NY, USA: ACM, 2004.

[204]

F. Silvestri, S. Orlando, and R. Perego, "Wings: A parallel indexer for web contents," in International Conference on Computational Science, pp. 263- 270, 2004.

[205]

D. D. Sleator and R. E. Tarjan, "Amortized efficiency of list update and paging rules," Communications of the ACM, vol. 28, no. 2, pp. 202-208, 1985.

Digital Library

[206]

A. J. Smith, "Cache memories," ACM Computing Surveys, vol. 14, no. 3, pp. 473-530, 1982.

Digital Library

[207]

M. Speretta and S. Gauch, "Personalized search based on user search histories," in Web Intelligence, pp. 622-628, 2005.

[208]

A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic, "From e-sex to e-commerce: Web search changes," Computer, vol. 35, no. 3, pp. 107-109, 2002.

Digital Library

[209]

A. Spink, S. Koshman, M. Park, C. Field, and B. J. Jansen, "Multitasking web search on vivisimo.com," in ITCC '05: Proceedings of the International Conference on Information Technology: Coding and Computing, (ITCC'05) Volume II, pp. 486-490,Washington, DC, USA: IEEE Computer Society, 2005.

[210]

A. Spink, H. C. Ozmutlu, and D. P. Lorence, "Web searching for sexual information: An exploratory study," Information Processing and Management, vol. 40, no. 1, pp. 113-123, 2004.

Digital Library

[211]

A. Spink and T. Saracevic, "Interaction in information retrieval: Selection and effectiveness of search terms," JASIS, vol. 48, no. 8, pp. 741-761, 1997.

[212]

A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic, "Searching the web: the public and their queries," Journal of the American Society for Information Science and Technology, vol. 52, pp. 226-234, February 2001.

Digital Library

[213]

J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, "Web usage mining: Discovery and applications of usage patterns from web data," SIGKDD Explorations, vol. 1, no. 2, pp. 12-23, 2000.

Digital Library

[214]

J. Teevan, E. Adar, R. Jones, and M. Potts, "History repeats itself: Repeat queries in yahoo's logs," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 703-704, New York, NY, USA: ACM, 2006.

[215]

J. Teevan, E. Adar, R. Jones, and M. A. S. Potts, "Information re-retrieval: Repeat queries in yahoo's logs," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 151-158, New York, NY, USA: ACM, 2007.

[216]

J. Teevan, S. T. Dumais, and E. Horvitz, "Beyond the commons: Investigating the value of personalizing web search," in Proceedings of Workshop on New Technologies for Personalized Information Access (PIA '05), Edinburgh, Scotland, UK, 2005.

[217]

J. Teevan, S. T. Dumais, and E. Horvitz, "Personalizing search via automated analysis of interests and activities," in SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449-456, New York, NY, USA: ACM Press, 2005.

[218]

"The associated press: Internet ad revenue exceeds $21b in 2007," 2008. http://ap.google.com/article/ALeqM5hccYd6ZuXTns2RWXUgh6br4n1UoQ D8V1GGC00.

[219]

H. Turtle and J. Flood, "Query evaluation: Strategies and optimizations," Information Processing and Management, vol. 31, no. 6, pp. 831-850, 1995.

Digital Library

[220]

M. van Erp and L. Schomaker, "Variants of the borda count method for combining ranked classifier hypotheses," in Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, pp. 443-452, International Unipen Foundation, 2000.

[221]

C. J. van Rijsbergen, Information Retrieval. London: Butterworths, 2nd ed., 1979.

[222]

M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos, "Identifying similarities, periodicities and bursts for online search queries," in SIGMOD '04: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 131-142, New York, NY, USA: ACM, 2004.

[223]

M. Vlachos, P. S. Yu, V. Castelli, and C. Meek, "Structural periodic measures for time-series data," Data Mining and Knowledge Discovery, vol. 12, no. 1, pp. 1-28, 2006.

Digital Library

[224]

D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer, "Classifying search engine queries using the web as background knowledge," SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 117-122, 2005.

Digital Library

[225]

X. Wang and C. Zhai, "Learn from web search logs to organize search results," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 87-94, New York, NY, USA: ACM, 2007.

[226]

R. Weiss, B. Vélez, and M. A. Sheldon, "Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering," in HYPERTEXT '96: Proceedings of the the Seventh ACM Conference on Hypertext, pp. 180-193, New York, NY, USA: ACM, 1996.

[227]

R. W. White, M. Bilenko, and S. Cucerzan, "Studying the use of popular destinations to enhance web search interaction," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 159-166, New York, NY, USA: ACM, 2007.

[228]

R. W. White, M. Bilenko, and S. Cucerzan, "Leveraging popular destinations to enhance web search interaction," ACM Transactions on the Web, vol. 2, no. 3, pp. 1-30, 2008.

Digital Library

[229]

R. W. White and D. Morris, "Investigating the querying and browsing behavior of advanced search engine users," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 255-262, New York, NY, USA: ACM, 2007.

[230]

L. Xiong and E. Agichtein, "Towards privacy-preserving query log publishing," in Query Log Analysis: Social And Technological Challenges. A workshop at the 16th International World Wide Web Conference (WWW 2007), (E. Amitay, C. G. Murray, and J. Teevan, eds.), May 2007.

[231]

J. Xu and J. Callan, "Effective retrieval with distributed collections," in SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112-120, New York, NY, USA: ACM, 1998.

[232]

J. Xu and W. B. Croft, "Cluster-based language models for distributed retrieval," in SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 254-261, New York, NY, USA: ACM, 1999.

[233]

J. Xu and W. B. Croft, "Improving the effectiveness of information retrieval with local context analysis," ACM Transactions on Information Systems, vol. 18, no. 1, pp. 79-112, 2000.

Digital Library

[234]

J. L. Xu and A. Spink, "Web research: The excite study," in WebNet 2000, pp. 581-585, 2000.

[235]

Yahoo! Grid, "Open source distributed computing: Yahoo's hadoop support," http://developer.yahoo.net/blog/archives/2007/07/yahoo-hadoop.html, 2007.

[236]

Y. Yang and C. G. Chute, "An example-based mapping method for text categorization and retrieval," ACM Transactions on Information Systems, vol. 12, no. 3, pp. 252-277, 1994.

Digital Library

[237]

Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A support vector method for optimizing average precision," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 271-278, New York, NY, USA: ACM, 2007.

[238]

B. Yuwono and D. L. Lee, "Server ranking for distributed text retrieval systems on the internet," in Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), pp. 41-50, World Scientific Press, 1997.

[239]

O. R. Zaïane and A. Strilets, "Finding similar queries to satisfy searches based on query traces," in OOIS Workshops, pp. 207-216, 2002.

[240]

J. Zhang and T. Suel, "Optimized inverted list assignment in distributed search engine architectures," in IPDPS, pp. 1-10, 2007.

[241]

Y. Zhang and A. Moffat, "Some observations on user search behavior," in Proceedings of the 11th Australasian Document Computing Symposium, Brisbane, Australia, 2006.

[242]

Z. Zhang and O. Nasraoui, "Mining search engine query logs for query recommendation," in WWW '06: Proceedings of the 15th international conference on World Wide Web, pp. 1039-1040, New York, NY, USA: ACM, 2006.

[243]

Q. Zhao, S. C. H. Hoi, T.-Y. Liu, S. S. Bhowmick, M. R. Lyu, and W.-Y. Ma, "Time-dependent semantic similarity measure of queries using historical click-through data," in WWW '06: Proceedings of the 15th international conference on World Wide Web, pp. 543-552, New York, NY, USA: ACM, 2006.

[244]

Z. Zheng, K. Chen, G. Sun, and H. Zha, "A regression framework for learning ranking functions using relative relevance judgments," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 287-294, New York, NY, USA: ACM, 2007.

[245]

G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, 1949.

[246]

J. Zobel and A. Moffat, "Inverted files for text search engines," ACM Computing Surveys, vol. 38, no. 2, p. 6, 2006.

Digital Library

Cited By

Pàmies-Estrems DGarcia-Alfaro J(2023)On the self-adjustment of privacy safeguards for query log streamsComputers and Security10.1016/j.cose.2023.103450134:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.cose.2023.103450
Ibáñez LSimperl EElsweiler DKruschwitz ULudwig B(2022)A comparison of dataset search behaviour of internal versus search engine referred sessionsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505821(158-168)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.1145/3498366.3505821
Navarro G(2021)Indexing Highly Repetitive String Collections, Part IIACM Computing Surveys10.1145/343299954:2(1-32)Online publication date: 9-Feb-2021
https://dl.acm.org/doi/10.1145/3432999
Show More Cited By

Mining Query Logs: Turning Search Usage Data into Knowledge
1. Information systems
  1. Information retrieval

Recommendations

Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Mining Query Logs
Mining query subtopics from search log data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, ...

Comments

Information & Contributors

Information

Published In

cover image Foundations and Trends in Information Retrieval

Foundations and Trends in Information Retrieval Volume 4, Issue 1—2

January 2010

176 pages

ISSN:1554-0669

EISSN:1554-0677

Issue’s Table of Contents

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 01 January 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

103
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pàmies-Estrems DGarcia-Alfaro J(2023)On the self-adjustment of privacy safeguards for query log streamsComputers and Security10.1016/j.cose.2023.103450134:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.cose.2023.103450
Ibáñez LSimperl EElsweiler DKruschwitz ULudwig B(2022)A comparison of dataset search behaviour of internal versus search engine referred sessionsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505821(158-168)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.1145/3498366.3505821
Navarro G(2021)Indexing Highly Repetitive String Collections, Part IIACM Computing Surveys10.1145/343299954:2(1-32)Online publication date: 9-Feb-2021
https://dl.acm.org/doi/10.1145/3432999
Palani SDing ZNguyen AChuang AMacNeil SDow SKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)CoNotate: Suggesting Queries Based on Notes Promotes Knowledge DiscoveryProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445618(1-14)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445618
Li Xde Rijke MLiu YMao JMa WZhang MMa SDiaz FShah CSuel TCastells PJones RSakai T(2021)Investigating Session Search Behavior with Knowledge GraphsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463107(1708-1712)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463107
Mele IMuntean CNardini FPerego RTonellotto NFrieder O(2021)Adaptive utterance rewriting for conversational searchInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10268258:6Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1016/j.ipm.2021.102682
Torbati GYates AWeikum GO'Brien HFreund LArapakis IHoeber OLopatovska I(2020)Personalized Entity Search by Sparse and Scrutable User ProfilesProceedings of the 2020 Conference on Human Information Interaction and Retrieval10.1145/3343413.3378011(427-431)Online publication date: 14-Mar-2020
https://dl.acm.org/doi/10.1145/3343413.3378011
Mele ITonellotto NFrieder OPerego R(2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1016/j.ipm.2019.102193
Markov Ide Rijke M(2019)What Should We Teach in Information Retrieval?ACM SIGIR Forum10.1145/3308774.330878052:2(19-39)Online publication date: 17-Jan-2019
https://dl.acm.org/doi/10.1145/3308774.3308780
Kiesel JBahrami AStein BAnand AHagen MAzzopardi LHalvey MRuthven IJoho HMurdock VQvarfordt P(2019)Clarifying False Memories in Voice-based SearchProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298961(331-335)Online publication date: 8-Mar-2019
https://dl.acm.org/doi/10.1145/3295750.3298961
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents