Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Mining Query Logs: Turning Search Usage Data into Knowledge

Published: 01 January 2010 Publication History

Abstract

Web search engines have stored in their logs information about users since they started to operate. This information often serves many purposes. The primary focus of this survey is on introducing to the discipline of query mining by showing its foundations and by analyzing the basic algorithms and techniques that are used to extract useful knowledge from this (potentially) infinite source of information. We show how search applications may benefit from this kind of analysis by analyzing popular applications of query log mining and their influence on user experience. We conclude the paper by, briefly, presenting some of the most challenging current open problems in this field.

References

[1]
E. Adar, "User 4xxxxx9: Anonymizing query logs," in Query Log Analysis: Social And Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007), (E. Amitay, C. G. Murray, and J. Teevan, eds.), May 2007.
[2]
E. Adar, D. S. Weld, B. N. Bershad, and S. S. Gribble, "Why we search: Visualizing and predicting user behavior," in WWW '07: Proceedings of the 16th International Conference on World Wide Web, pp. 161-170, New York, NY, USA: ACM, 2007.
[3]
A. Agarwal and S. Chakrabarti, "Learning random walks to rank nodes in graphs," in ICML '07: Proceedings of the 24th International Conference on Machine Learning, pp. 9-16, New York, NY, USA: ACM, 2007.
[4]
E. Agichtein, E. Brill, and S. Dumais, "Improving web search ranking by incorporating user behavior information," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19-26, New York, NY, USA: ACM, 2006.
[5]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno, "Learning user interaction models for predicting web search result preferences," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3-10, New York, NY, USA: ACM, 2006.
[6]
E. Agichtein and Z. Zheng, "Identifying "best bet" web search results by mining past user behavior," in KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 902- 908, New York, NY, USA: ACM, 2006.
[7]
R. Agrawal, T. Imielinski, and A. N. Swami, "Mining association rules between sets of items in large databases," in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26-28, 1993, (P. Buneman and S. Jajodia, eds.), pp. 207-216, ACM Press, 1993.
[8]
F. Ahmad and G. Kondrak, "Learning a spelling error model from search query logs," in Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 955-962, Vancouver, Canada: Association for Computational Linguistic, October 2005.
[9]
C. Anderson, The Long Tail. Random House Business, 2006.
[10]
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan, "Searching the web," ACM Transactions on Internet Technology, vol. 1, no. 1, pp. 2-43, 2001.
[11]
V. Authors, "About web analytics association," Retrieved on August 2009. http://www.webanalyticsassociation.org/aboutus/.
[12]
R. Baeza-Yates, Web Mining: Applications and Techniques. ch. Query Usage Mining in Search Engines, pp. 307-321, Idea Group, 2004.
[13]
R. Baeza-Yates, "Algorithmic challenges in web search engines," in Proceedings of the 7th Latin American Symposium on Theoretical Informatics (LATIN'06), pp. 1-7, Valdivia, Chile, 2006.
[14]
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri, "Challenges in distributed information retrieval," in International Conference on Data Engineering (ICDE), Istanbul, Turkey: IEEE CS Press, April 2007.
[15]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, "The impact of caching on search engines," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 183-190, New York, NY, USA: ACM, 2007.
[16]
R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri, "Design trade-offs for search engine caching," ACM Transactions on the Web, vol. 2, no. 4, pp. 1-28, 2008.
[17]
R. Baeza-Yates, C. Hurtado, and M. Mendoza, Query Recommendation Using Query Logs in Search Engines. pp. 588-596. Vol. 3268/2004 of Lecture Notes in Computer Science, Berlin, Heidelberg: Springer, November 2004.
[18]
R. Baeza-Yates, C. Hurtado, and M. Mendoza, "Ranking boosting based in query clustering," in Proceedings of 2004 Atlantic Web Intelligence Conference, Cancun, Mexico, 2004.
[19]
R. Baeza-Yates and A. Tiberi, "Extracting semantic relations from query logs," in KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 76-85, New York, NY, USA: ACM, 2007.
[20]
R. A. Baeza-Yates, "Applications of web query mining," in Advances in Information Retrieval, 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain, March 21-23, 2005, Proceedings, (D. E. Losada and J. M. Fernández-Luna, eds.), pp. 7-22, Springer, 2005.
[21]
R. A. Baeza-Yates, "Graphs from search engine queries," in SOFSEM 2007: Theory and Practice of Computer Science, 33rd Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 20-26, 2007, Proceedings, (J. van Leeuwen, G. F. Italiano, W. van der Hoek, C. Meinel, H. Sack, and F. Plasil, eds.), pp. 1-8, Springer, 2007.
[22]
R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza, "Improving search engines by query clustering," JASIST, vol. 58, no. 12, pp. 1793-1804, 2007.
[23]
R. A. Baeza-Yates, C. A. Hurtado, M. Mendoza, and G. Dupret, "Modeling user search behavior," in Third Latin American Web Congress (LAWeb 2005), 1 October - 2 November 2005, Buenos Aires, Argentina, pp. 242-251, IEEE Computer Society, 2005.
[24]
R. A. Baeza-Yates, F. Junqueira, V. Plachouras, and H. F. Witschel, "Admission policies for caches of search engine results," in SPIRE, pp. 74-85, 2007.
[25]
R. A. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
[26]
R. A. Baeza-Yates and F. Saint-Jean, "A three level search engine index based in query log distribution," in SPIRE, pp. 56-65, 2003.
[27]
J. Bar-Ilan, "Access to query logs -- an academic researcher's point of view," in Query Log Analysis: Social And Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007), (E. Amitay, C. G. Murray, and J. Teevan, eds.), May 2007.
[28]
Z. Bar-Yossef and M. Gurevich, "Mining search engine query logs via suggestion sampling," Proceedings of the VLDB Endowment, vol. 1, no. 1, pp. 54-65, 2008.
[29]
R. Baraglia, F. Cacheda, V. Carneiro, F. Diego, V. Formoso, R. Perego, and F. Silvestri, "Search shortcuts: A new approach to the recommendation of queries," in RecSys '09: Proceedings of the 2009 ACM Conference on Recommender Systems, New York, NY, USA: ACM, 2009.
[30]
R. Baraglia, F. Cacheda, V. Carneiro, V. Formoso, R. Perego, and F. Silvestri, "Search shortcuts: Driving users towards their goals," in WWW '09: Proceedings of the 18th International Conference on World Wide Web, pp. 1073-1074, New York, NY, USA: ACM, 2009.
[31]
R. Baraglia, F. Cacheda, V. Carneiro, V. Formoso, R. Perego, and F. Silvestri, "Search shortcuts using click-through data," in WSCD '09: Proceedings of the 2009 Workshop on Web Search Click Data, pp. 48-55, New York, NY, USA: ACM, 2009.
[32]
R. Baraglia and F. Silvestri, "Dynamic personalization of web sites without user intervention," Communications of the ACM, vol. 50, no. 2, pp. 63-67, 2007.
[33]
L. A. Barroso, J. Dean, and U. Hölzle, "Web search for a planet: The google cluster architecture," IEEE Micro, vol. 23, no. 2, pp. 22-28, 2003.
[34]
S. M. Beitzel, E. C. Jensen, A. Chowdhury, O. Frieder, and D. Grossman, "Temporal analysis of a very large topically categorized web query log," Journal of the American Society for Information Science and Technology, vol. 58, no. 2, pp. 166-178, 2007.
[35]
S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder, "Hourly analysis of a very large topically categorized web query log," in SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 321-328, New York, NY, USA: ACM, 2004.
[36]
S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz, "Improving automatic query classification via semi-supervised learning," in ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 42-49, Washington, DC, USA: IEEE Computer Society, 2005.
[37]
S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, and O. Frieder, "Automatic classification of web queries using very large unlabeled query logs," ACM Transactions on Information Systems, vol. 25, no. 2, p. 9, 2007.
[38]
L. A. Belady, "A study of replacement algorithms for a virtual storage computer," IBM Systems Journal, vol. 5, no. 2, pp. 78-101, 1966.
[39]
R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.
[40]
"Beowulf Project at CESDIS," http://www.beowulf.org.
[41]
M. Bilenko and R. W. White, "Mining the search trails of surfing crowds: Identifying relevant websites from user activity," in WWW '08: Proceeding of the 17th International Conference on World Wide Web, pp. 51-60, New York, NY, USA: ACM, 2008.
[42]
B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel, "Query expansion using associated queries," in Proceedings of the twelfth international conference on information and knowledge management, pp. 2-9, ACM Press, 2003.
[43]
P. Boldi and S. Vigna, "The webgraph framework i: Compression techniques," in WWW '04: Proceedings of the 13th International Conference on World Wide Web, pp. 595-602, New York, NY, USA: ACM Press, 2004.
[44]
J. Boyan, D. Freitag, and T. Joachims, "A machine learning architecture for optimizing web search engines," in Proceedings of the AAAI Workshop on Internet-Based Information Systems, 1996.
[45]
O. Boydell and B. Smyth, "Capturing community search expertise for personalized web search using snippet-indexes," in CIKM '06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 277-286, New York, NY, USA: ACM, 2006.
[46]
J. S. Breese, D. Heckerman, and C. M. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in UAI, pp. 43-52, 1998.
[47]
S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," in WWW7: Proceedings of the Seventh International Conference on World Wide Web 7, pp. 107-117, Amsterdam, The Netherlands: Elsevier Science Publishers B.V., 1998.
[48]
A. Z. Broder, "A taxonomy of web search," SIGIR Forum, vol. 36, no. 2, pp. 3-10, 2002.
[49]
A. Z. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang, "Robust classification of rare queries using web knowledge," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231- 238, New York, NY, USA: ACM, 2007.
[50]
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, "Syntactic Clustering of the Web," in Selected Papers from the Sixth International Conference on World Wide Web, pp. 1157-1166, Essex, UK: Elsevier Science Publishers Ltd., 1997.
[51]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to rank using gradient descent," in ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pp. 89-96, New York, NY, USA: ACM, 2005.
[52]
C. J. C. Burges, R. Ragno, and Q. V. Le, "Learning to rank with nonsmooth cost functions.," in NIPS, (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 193-200, MIT Press, 2006.
[53]
R. Buyya, ed., High Performance Cluster Computing. Prentice Hall PTR, 1999.
[54]
H. C. by Thomas, E. L. Charles, L. R. Ronald, and S. Clifford, Introduction to Algorithms. The MIT Press, 2001.
[55]
J. Callan and M. Connell, "Query-based sampling of text databases," ACM Transactions on Information Systems, vol. 19, no. 2, pp. 97-130, 2001.
[56]
J. P. Callan, Z. Lu, and W. B. Croft, "Searching distributed collections with inference networks," in SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21-28, New York, NY, USA: ACM, 1995.
[57]
C. Castillo, "Effective web crawling," PhD thesis, Department of Computer Science -- University of Chile, Santiago, Chile, November 2004.
[58]
J. Caverlee, L. Liu, and J. Bae, "Distributed query sampling: A quality-conscious approach," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 340-347, New York, NY, USA: ACM, 2006.
[59]
D. Chakrabarti, R. Kumar, and A. Tomkins, "Evolutionary clustering," in KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 554-560, New York, NY, USA: ACM, 2006.
[60]
Q. Chen, M. Li, and M. Zhou, "Improving query spelling correction using web search results," in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 181-189, Prague, Czech Republic: Association for Computational Linguistic, June 2007.
[61]
F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, and E. Upfal, "Finding near neighbors through cluster pruning," in Proceedings of ACM SIGMOD/PODS 2007 Conference, 2007.
[62]
P. A. Chirita, C. S. Firan, and W. Nejdl, "Personalized query expansion for the web," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 7-14, New York, NY, USA: ACM, 2007.
[63]
A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe, "Collection statistics for fast duplicate document detection," ACM Transactions on Information Systems, vol. 20, no. 2, pp. 171-191, 2002.
[64]
A. Cooper, "A survey of query log privacy-enhancing techniques from a policy perspective," ACM Transactions on the Web, vol. 2, no. 4, pp. 1-27, 2008.
[65]
N. Craswell, P. Bailey, and D. Hawking, "Server selection on the world wide web," in DL '00: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 37-46, New York, NY, USA: ACM, 2000.
[66]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey, "An experimental comparison of click position-bias models," in WSDM '08: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 87-94, New York, NY, USA: ACM, 2008.
[67]
S. Cucerzan and E. Brill, "Spelling correction as an iterative process that exploits the collective knowledge of web users," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pp. 293-300, July 2004.
[68]
S. Cucerzan and R. W. White, "Query suggestion based on user landing pages," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 875-876, New York, NY, USA: ACM Press, 2007.
[69]
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, "Probabilistic query expansion using query logs," in WWW '02: Proceedings of the 11th International Conference on World Wide Web, pp. 325-332, New York, NY, USA: ACM, 2002.
[70]
E. Cutrell and Z. Guan, "What are you looking for? An eye-tracking study of information usage in web search," in CHI '07: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 407-416, New York, NY, USA: ACM, 2007.
[71]
F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of the ACM, vol. 7, no. 3, pp. 171-176, 1964.
[72]
I. S. Dhillon, S. Mallela, and D. S. Modha, "Information-theoretic co-clustering," in Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 89-98, 2003.
[73]
Z. Dou, R. Song, and J. Wen, "A large-scale evaluation and analysis of personalized search strategies," in Proceedings of the 16th International World Wide Web Conference (WWW2007), pp. 572-581, May 2007.
[74]
T. Fagni, R. Perego, F. Silvestri, and S. Orlando, "Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data," ACM Transactions on Information Systems, vol. 24, no. 1, pp. 51-78, 2006.
[75]
C. H. Fenichel, "Online searching: Measures that discriminate among users with different types of experience," JASIS, vol. 32, no. 1, pp. 23-32, 1981.
[76]
P. Ferragina and A. Gulli, "A personalized search engine based on web-snippet hierarchical clustering," in WWW '05: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 801-810, New York, NY, USA: ACM, 2005.
[77]
L. Fitzpatrick and M. Dent, "Automatic feedback using past queries: social searching?," in SIGIR '97: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 306-313, New York, NY, USA: ACM, 1997.
[78]
B. M. Fonseca, P. B. Golgher, E. S. de Moura, and N. Ziviani, "Using association rules to discover search engines related queries," in LA-WEB '03: Proceedings of the First Conference on Latin American Web Congress, p. 66, Washington, DC, USA: IEEE Computer Society, 2003.
[79]
I. Foster and C. Kesselman, eds., The Grid: Blueprint for a Future Computing Infrastructure. Morgan-Kaufmann, 1999.
[80]
S. T. I. Foster and C. Kesselman, "The anatomy of the grid: Enabling scalable virtual organization," Int'l Journal on Supercomputer Application, vol. 3, no. 15.
[81]
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, "An efficient boosting algorithm for combining preferences," Journal of Machine Learning Research, vol. 4, pp. 933-969, 2003.
[82]
N. Fuhr, "Optimal polynomial retrieval functions based on the probability ranking principle," ACM Transactions on Information Systems, vol. 7, no. 3, pp. 183-204, 1989.
[83]
N. Fuhr, "A decision-theoretic approach to database selection in networked ir," ACM Transactions on Information Systems, vol. 17, no. 3, pp. 229-249, 1999.
[84]
N. Fuhr, S. Hartmann, G. Knorz, G. Lustig, M. Schwantner, and K. Tzeras, "AIR/X--a rule-based multistage indexing system for large subject fields," in Proceedings of the RIAO'91, Barcelona, Spain, April 2-5, 1991, pp. 606-623, 1991.
[85]
G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu, "Parameter free bursty events detection in text streams," in VLDB '05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 181-192, VLDB Endowment, 2005.
[86]
G. W. Furnas, S. C. Deerwester, S. T. Dumais, T. K. Landauer, R. A. Harshman, L. A. Streeter, and K. E. Lochbaum, "Information retrieval using a singular value decomposition model of latent semantic structure," in SIGIR, pp. 465-480, 1988.
[87]
G. Galilei, "Discorsi e dimostrazioni matematiche intorno a due nuove scienze," Leida : Appresso gli Elsevirii, 1638.
[88]
L. A. Granka, T. Joachims, and G. Gay, "Eye-tracking analysis of user behavior in www search," in SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information retrieval, pp. 478-479, New York, NY, USA: ACM, 2004.
[89]
L. Gravano, H. Garcia-Molina, and A. Tomasic, "The efficacy of gloss for the text database discovery problem," Technical Report, Stanford University, Stanford, CA, USA, 1993.
[90]
L. Gravano, H. García-Molina, and A. Tomasic, "The effectiveness of gioss for the text database discovery problem," in SIGMOD '94: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 126-137, New York, NY, USA: ACM, 1994.
[91]
L. Gravano, H. García-Molina, and A. Tomasic, "Gloss: text-source discovery over the internet," ACM Transactions on Database Systems, vol. 24, no. 2, pp. 229-264, 1999.
[92]
L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein, "Categorizing web queries according to geographical locality," in CIKM '03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 325-333, New York, NY, USA: ACM, 2003.
[93]
Z. Guan and E. Cutrell, "An eye tracking study of the effect of target rank on web search," in CHI '07: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 417-420, New York, NY, USA: ACM, 2007.
[94]
T. H. Haveliwala, "Topic-sensitive pagerank," in WWW '02: Proceedings of the 11th International Conference on World Wide Web, pp. 517-526, New York, NY, USA: ACM, 2002.
[95]
D. Hawking, "Overview of the trec-9 web track," in TREC, 2000.
[96]
D. Hawking, "Web search engines: Part 1," Computer, vol. 39, no. 6, pp. 86-88, 2006.
[97]
D. Hawking, "Web search engines: Part 2," Computer, vol. 39, no. 8, pp. 88-90, 2006.
[98]
D. Hawking and P. Thistlewaite, "Methods for information server selection," ACM Transactions on Information Systems, vol. 17, no. 1, pp. 40-76, 1999.
[99]
J. Hennessy and D. Patterson, Computer Architecture -- A Quantitative Approach. Morgan Kaufmann, 2003.
[100]
M. R. Henzinger, "Algorithmic challenges in web search engines," Internet Mathematics, vol. 1, no. 1, 2003.
[101]
M. R. Henzinger, R. Motwani, and C. Silverstein, "Challenges in web search engines," SIGIR Forum, vol. 36, no. 2, pp. 11-22, 2002.
[102]
T. C. Hoad and J. Zobel, "Methods for identifying versioned and plagiarized documents," Journal of the American Society for Information Science and Technology, vol. 54, no. 3, pp. 203-215, 2003.
[103]
I. Hsieh-Yee, "Effects of search experience and subject knowledge on the search tactics of novice and experienced searchers," JASIS, vol. 44, no. 3, pp. 161-174, 1993.
[104]
A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: A review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[105]
B. J. Jansen and M. Resnick, "An examination of searcher's perceptions of nonsponsored and sponsored links during ecommerce web searching," Journal of the American Society for Information Science and Technology, vol. 57, no. 14, pp. 1949-1961, 2006.
[106]
B. J. Jansen and A. Spink, "An analysis of web searching by european alltheweb.com users," Information Processing and Management, vol. 41, no. 2, pp. 361-381, 2005.
[107]
B. J. Jansen and A. Spink, "How are we searching the world wide web? A comparison of nine search engine transaction logs," Information Processing and Management, vol. 42, no. 1, pp. 248-263, 2006.
[108]
B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic, "Real life information retrieval: A study of user queries on the web," SIGIR Forum, vol. 32, no. 1, pp. 5-17, 1998.
[109]
B. J. Jansen, A. Spink, and S. Koshman, "Web searcher interaction with the dogpile.com metasearch engine," JASIST, vol. 58, no. 5, pp. 744-755, 2007.
[110]
B. J. J. Jansen, "Understanding user-web interactions via web analytics," Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 1, no. 1, pp. 1-102, 2009.
[111]
T. Joachims, "Optimizing search engines using clickthrough data," in KDD '02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, New York, NY, USA: ACM Press, 2002.
[112]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay, "Accurately interpreting clickthrough data as implicit feedback," in SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 154-161, New York, NY, USA: ACM, 2005.
[113]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, "Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search," ACM Transactions on Information Systems, vol. 25, no. 2, p. 7, 2007.
[114]
T. Joachims, H. Li, T.-Y. Liu, and C. Zhai, "Learning to rank for information retrieval (lr4ir 2007)," SIGIR Forum, vol. 41, no. 2, pp. 58-62, 2007.
[115]
T. Joachims and F. Radlinski, "Search engines that learn from implicit feedback," Computer, vol. 40, no. 8, pp. 34-40, 2007.
[116]
K. S. Jones, S. Walker, and S. E. Robertson, "A probabilistic model of information retrieval: Development and comparative experiments," Information Processing and Management, vol. 36, no. 6, pp. 779-808, 2000.
[117]
R. Jones, R. Kumar, B. Pang, and A. Tomkins, ""I know what you did last summer": Query logs and user privacy," in CIKM '07: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 909-914, New York, NY, USA: ACM, 2007.
[118]
R. Jones, B. Rey, O. Madani, and W. Greiner, "Generating query substitutions," in WWW '06: Proceedings of the 15th International Conference on World Wide Web, pp. 387-396, New York, NY, USA: ACM Press, 2006.
[119]
R. Karedla, J. S. Love, and B. G. Wherry, "Caching strategies to improve disk system performance," Computer, vol. 27, no. 3, pp. 38-46, 1994.
[120]
M. Kendall, Rank Correlation Methods. Hafner, 1955.
[121]
J. Kleinberg, "Bursty and hierarchical structure in streams," in KDD '02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91-101, New York, NY, USA: ACM, 2002.
[122]
J. M. Kleinberg, "Authoritative sources in a hyperlinked environment," Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.
[123]
S. Koshman, A. Spink, and B. J. Jansen, "Web searching on the vivisimo search engine," JASIST, vol. 57, no. 14, pp. 1875-1887, 2006.
[124]
M. Koster, "Aliweb: Archie-like indexing in the web," Computer Networks and ISDN Systems, vol. 27, no. 2, pp. 175-182, 1994.
[125]
S. Kullback and R. A. Leibler, "On information and sufficiency," Annals of Mathematical Statistics, vol. 22, pp. 49-86, 1951.
[126]
R. Kumar, J. Novak, B. Pang, and A. Tomkins, "On anonymizing query logs via token-based hashing," in WWW '07: Proceedings of the 16th International Conference on World Wide Web, pp. 629-638, New York, NY, USA: ACM, 2007.
[127]
T. Lau and E. Horvitz, "Patterns of search: analyzing and modeling web query refinement," in UM '99: Proceedings of the Seventh International Conference on User Modeling, pp. 119-128, Secaucus, NJ, USA: Springer-Verlag New York, Inc., 1999.
[128]
U. Lee, Z. Liu, and J. Cho, "Automatic identification of user goals in web search," in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 391-400, New York, NY, USA: ACM, 2005.
[129]
R. Lempel and S. Moran, "Predictive caching and prefetching of query results in search engines," in WWW '03: Proceedings of the 12th International Conference on World Wide Web, pp. 19-28, New York, NY, USA: ACM, 2003.
[130]
R. Lempel and S. Moran, "Competitive caching of query results in search engines," Theoretical Computer Science, vol. 324, no. 2-3, pp. 253-271, 2004.
[131]
R. Lempel and S. Moran, "Optimizing result prefetching in web search engines with segmented indices," ACM Transactions on Internet Technology, vol. 4, no. 1, pp. 31-59, 2004.
[132]
R. Lempel and F. Silvestri, "Web search result caching and prefetching," Encyclopedia of Database Systems, Springer Verlag, 2008.
[133]
M. Li, Y. Zhang, M. Zhu, and M. Zhou, "Exploring distributional similarity based models for query spelling correction," in ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1025- 1032, Morristown, NJ, USA: Association for Computational Linguistics, 2006.
[134]
Y. Li, Z. Zheng, and H. K. Dai, "Kdd cup-2005 report: facing a great challenge," SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 91-99, 2005.
[135]
F. Liu, C. Yu, and W. Meng, "Personalized web search by mapping user queries to categories," in CIKM '02: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 558-565, New York, NY, USA: ACM Press, 2002.
[136]
Live Search Team at Microsoft, "Local, relevance, and japan!," http://blogs.msdn.com/livesearch/archive/2005/06/21/431288.aspx, 2005.
[137]
X. Long and T. Suel, "Three-level caching for efficient query processing in large web search engines," in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 257-266, New York, NY, USA: ACM, 2005.
[138]
R. M. Losee and L. C. Jr., "Information retrieval with distributed databases: Analytic models of performance," IEEE Transactions on Parallel & Distributed Systems, vol. 15, no. 1, pp. 18-27, 2004.
[139]
C. Lucchese, S. Orlando, R. Perego, and F. Silvestri, "Mining query logs to optimize index partitioning in parallel web search engines," in InfoScale '07: Proceedings of the 2nd International Conference on Scalable Information Systems, New York, NY, USA: ACM, 2007.
[140]
T.-Y. Lui, "Learning to rank for information retrieval," Foundations and Trends in Information Retrieval, vol. 3, no. 3, 2008.
[141]
Y. Lv, L. Sun, J. Zhang, J.-Y. Nie, W. Chen, and W. Zhang, "An iterative implicit feedback approach to personalized search," in ACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 585-592, Morristown, NJ, USA: Association for Computational Linguistics, 2006.
[142]
C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.
[143]
M. Marchiori, "The quest for correct information on the web: Hyper search engines," Computer Networks, vol. 29, no. 8-13, pp. 1225-1236, 1997.
[144]
E. P. Markatos, "On caching search engine query results," Computer Communications, vol. 24, pp. 137-143, 1 February 2000.
[145]
M. Mat-Hassan and M. Levene, "Associating search and navigation behavior through log analysis: Research articles," Journal of the American Society for Information Science and Technology, vol. 56, no. 9, pp. 913-934, 2005.
[146]
O. A. McBryan, "Genvl and wwww: Tools for taming the web," in Proceedings of the First International World Wide Web Conference, (O. Nierstarsz, ed.), p. 15, CERN, Geneva, 1994.
[147]
S. Melink, S. Raghavan, B. Yang, and H. Garcia-Molina, "Building a distributed full-text index for the web," ACM Transactions on Information Systems, vol. 19, no. 3, pp. 217-241, 2001.
[148]
T. Mitchell, Machine Learning. McGraw-Hill International Editions, 1997.
[149]
A. Moffat, W. Webber, and J. Zobel, "Load balancing for term-distributed parallel retrieval," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 348-355, New York, NY, USA: ACM, 2006.
[150]
A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates, "A pipelined architecture for distributed text query evaluation," Information Retrieval, vol. 10, no. 3, pp. 205-231, 2007.
[151]
A. Moffat and J. Zobel, "Information retrieval systems for large document collections," in TREC, 1994.
[152]
E. J. O'Neil, P. E. O'Neil, and G. Weikum, "An optimality proof of the lruk page replacement algorithm," Journal of the ACM, vol. 46, no. 1, pp. 92-112, 1999.
[153]
S. Orlando, R. Perego, and F. Silvestri, "Design of a parallel and distributed WEB search engine," in Proceedings of Parallel Computing (ParCo) 2001 conference, Imperial College Press, September 2001.
[154]
H. C. Ozmutlu, A. Spink, and S. Ozmutlu, "Analysis of large data logs: An application of poisson sampling on excite web queries," Information Processing and Management, vol. 38, no. 4, pp. 473-490, 2002.
[155]
S. Ozmutlu, H. C. Ozmutlu, and A. Spink, "Multitasking web searching and implications for design," JASIST, vol. 40, no. 1, pp. 416-421, 2003.
[156]
S. Ozmutlu, A. Spink, and H. C. Ozmutlu, "A day in the life of web searching: An exploratory study," Information Processing and Management, vol. 40, no. 2, pp. 319-345, 2004.
[157]
L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web," Technical Report, Stanford Digital Library Technologies Project, 1998.
[158]
S. Pandey and C. Olston, "User-centric web crawling," in WWW '05: Proceedings of the 14th International Conference on World Wide Web, pp. 401-411, New York, NY, USA: ACM, 2005.
[159]
S. Pandey and C. Olston, "Crawl ordering by search impact," in WSDM '08: Proceedings of the international conference on Web search and web data mining, pp. 3-14, New York, NY, USA: ACM, 2008.
[160]
G. Pass, A. Chowdhury, and C. Torgeson, "A picture of search," in InfoScale '06: Proceedings of the First International Conference on Scalable Information Systems, p. 1, New York, NY, USA: ACM, 2006.
[161]
"Pew research center for the people & the press," WWW page, 2007. http://people-press.org/.
[162]
J. Piskorski and M. Sydow, "String distance metrics for reference matching and search query correction," in Business Information Systems, 10th International Conference, BIS 2007, Poznan, Poland, April 2007, (W. Abramowicz, ed.), pp. 356-368, Springer-Verlag, 2007.
[163]
J. Pitkow, H. Schütze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T. Breuel, "Personalized search," Communications of the ACM, vol. 45, no. 9, pp. 50-55, 2002.
[164]
B. Poblete, M. Spiliopoulou, and R. Baeza-Yates, "Website privacy preservation for query log publishing," in First International Workshop on Privacy, Security, and Trust in KDD (PINKDD'07), August 2007.
[165]
S. Podlipnig and L. Böszörmenyi, "A survey of web cache replacement strategies," ACM Computing Surveys, vol. 35, no. 4, pp. 374-398, 2003.
[166]
A. L. Powell and J. C. French, "Comparing the performance of collection selection algorithms," ACM Transactions on Information Systems, vol. 21, no. 4, pp. 412-456, 2003.
[167]
A. L. Powell, J. C. French, J. Callan, M. Connell, and C. L. Viles, "The impact of database selection on distributed searching," in SIGIR '00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232-239, New York, NY, USA: ACM, 2000.
[168]
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C. Cambridge University Press, Second ed., 1992.
[169]
D. Puppin, "A search engine architecture based on collection selection," PhD thesis, Dipartimento di Informatica, Università di Pisa, Pisa, Italy, December 2007.
[170]
D. Puppin and F. Silvestri, "The query-vector document model," in CIKM '06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 880-881, New York, NY, USA: ACM, 2006.
[171]
D. Puppin, F. Silvestri, and D. Laforenza, "Query-driven document partitioning and collection selection," in InfoScale '06: Proceedings of the First International Conference on Scalable Information Systems, p. 34, New York, NY, USA: ACM, 2006.
[172]
D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates, "Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load," ACM Transactions on Information Systems.
[173]
D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates, "Load-balancing and caching for collection selection architectures," in InfoScale '07: Proceedings of the 2nd International Conference on Scalable Information Systems, New York, NY, USA: ACM, 2007.
[174]
F. Qiu and J. Cho, "Automatic Identification of User Interest for Personalized Search," in WWW '06: Proceedings of the 15th International Conference on World Wide Web, pp. 727-736, New York, NY, USA: ACM, 2006.
[175]
F. Radlinski and T. Joachims, "Query chains: learning to rank from implicit feedback," in KDD '05: Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 239-248, New York, NY, USA: ACM Press, 2005.
[176]
F. Radlinski and T. Joachims, "Active exploration for learning rankings from clickthrough data," in KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 570-579, New York, NY, USA: ACM, 2007.
[177]
K. H. Randall, R. Stata, J. L. Wiener, and R. G. Wickremesinghe, "The link database: Fast access to graphs of the web," in DCC '02: Proceedings of the Data Compression Conference (DCC '02), p. 122, Washington, DC, USA: IEEE Computer Society, 2002.
[178]
S. E. Robertson and S. Walker, "Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval," in SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232-241, New York, NY, USA: Springer-Verlag New York, Inc., 1994.
[179]
S. E. Robertson and S. Walker, "Okapi/keenbow at trec-8," in TREC, 1999.
[180]
J. T. Robinson and M. V. Devarakonda, "Data cache management using frequency-based replacement," SIGMETRICS Performance Evaluation Review, vol. 18, no. 1, pp. 134-142, 1990.
[181]
J. Rocchio, Relevance Feedback in Information Retrieval. Prentice-Hall, 1971.
[182]
G. Salton and C. Buckley, "Parallel text search methods," Communications of the ACM, vol. 31, no. 2, pp. 202-215, 1988.
[183]
G. Salton and C. Buckley, "Improving retrieval performance by relevance feedback," JASIS, vol. 41, no. 4, pp. 288-297, 1990.
[184]
G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY, USA: McGraw-Hill, Inc., 1986.
[185]
M. Sanderson and S. T. Dumais, "Examining repetition in user search behavior," in ECIR, pp. 597-604, 2007.
[186]
P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Ribeiro-Neto, "Rank-preserving two-level caching for scalable search engines," in SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 51-58, New York, NY, USA: ACM, 2001.
[187]
F. Scholer, H. E. Williams, and A. Turpin, "Query association surrogates for web search: Research articles," Journal of the American Society for Information Science and Technology, vol. 55, no. 7, pp. 637-650, 2004.
[188]
"Search engine use shoots up in the past year and edges towards email as the primary internet application," WWW page, 2005. http://www.pewinternet. org/pdfs/PIP_SearchData_1105.pdf.
[189]
"Search engine users," WWW page, 2005. http://www.pewinternet.org/ pdfs/PIP_Searchengine_users.pdf.
[190]
"Search engine users," White paper, 2005. http://www.enquiroresearch.com/ personalization/.
[191]
F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[192]
D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang, "Q2c@ust: Our winning solution to query classification in kddcup 2005," SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 100-110, 2005.
[193]
X. Shen, B. Tan, and C. Zhai, "Ucair: A personalized search toolbar," in SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 681-681, New York, NY, USA: ACM, 2005.
[194]
M. Shokouhi, J. Zobel, and Y. Bernstein, "Distributed text retrieval from overlapping collections," in ADC '07: Proceedings of the Eighteenth Conference on Australasian Database, pp. 141-150, Darlinghurst, Australia: Australian Computer Society, Inc., 2007.
[195]
M. Shokouhi, J. Zobel, S. Tahaghoghi, and F. Scholer, "Using query logs to establish vocabularies in distributed information retrieval," Information Processing and Management, vol. 43, no. 1, pp. 169-180, 2007.
[196]
L. Si and J. Callan, "Using sampled data and regression to merge search engine results," in SIGIR '02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19-26, New York, NY, USA: ACM, 2002.
[197]
L. Si and J. Callan, "Relevant document distribution estimation method for resource selection," in SIGIR '03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 298-305, New York, NY, USA: ACM, 2003.
[198]
S. Siegfried, M. J. Bates, and D. N. Wilde, "A profile of end-user searching behavior by humanities scholars: The getty online searching project report no. 2," JASIS, vol. 44, no. 5, pp. 273-291, 1993.
[199]
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz, "Analysis of a very large altavista query log," Technical Report, Systems Research Center -- 130 Lytton Avenue -- Palo Alto, California 94301, 1998.
[200]
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz, "Analysis of a very large web search engine query log," SIGIR Forum, vol. 33, no. 1, pp. 6-12, 1999.
[201]
F. Silvestri, "High performance issues in web search engines: Algorithms and techniques," PhD thesis, Dipartimento di Informatica, Università di Pisa, Pisa, Italy, May 2004.
[202]
F. Silvestri, "Sorting out the document identifier assignment problem," in Proceedings of the 29th European Conference on Information Retrieval, April 2007.
[203]
F. Silvestri, S. Orlando, and R. Perego, "Assigning identifiers to documents to enhance the clustering property of fulltext indexes," in SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 305-312, New York, NY, USA: ACM, 2004.
[204]
F. Silvestri, S. Orlando, and R. Perego, "Wings: A parallel indexer for web contents," in International Conference on Computational Science, pp. 263- 270, 2004.
[205]
D. D. Sleator and R. E. Tarjan, "Amortized efficiency of list update and paging rules," Communications of the ACM, vol. 28, no. 2, pp. 202-208, 1985.
[206]
A. J. Smith, "Cache memories," ACM Computing Surveys, vol. 14, no. 3, pp. 473-530, 1982.
[207]
M. Speretta and S. Gauch, "Personalized search based on user search histories," in Web Intelligence, pp. 622-628, 2005.
[208]
A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic, "From e-sex to e-commerce: Web search changes," Computer, vol. 35, no. 3, pp. 107-109, 2002.
[209]
A. Spink, S. Koshman, M. Park, C. Field, and B. J. Jansen, "Multitasking web search on vivisimo.com," in ITCC '05: Proceedings of the International Conference on Information Technology: Coding and Computing, (ITCC'05) Volume II, pp. 486-490,Washington, DC, USA: IEEE Computer Society, 2005.
[210]
A. Spink, H. C. Ozmutlu, and D. P. Lorence, "Web searching for sexual information: An exploratory study," Information Processing and Management, vol. 40, no. 1, pp. 113-123, 2004.
[211]
A. Spink and T. Saracevic, "Interaction in information retrieval: Selection and effectiveness of search terms," JASIS, vol. 48, no. 8, pp. 741-761, 1997.
[212]
A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic, "Searching the web: the public and their queries," Journal of the American Society for Information Science and Technology, vol. 52, pp. 226-234, February 2001.
[213]
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, "Web usage mining: Discovery and applications of usage patterns from web data," SIGKDD Explorations, vol. 1, no. 2, pp. 12-23, 2000.
[214]
J. Teevan, E. Adar, R. Jones, and M. Potts, "History repeats itself: Repeat queries in yahoo's logs," in SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 703-704, New York, NY, USA: ACM, 2006.
[215]
J. Teevan, E. Adar, R. Jones, and M. A. S. Potts, "Information re-retrieval: Repeat queries in yahoo's logs," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 151-158, New York, NY, USA: ACM, 2007.
[216]
J. Teevan, S. T. Dumais, and E. Horvitz, "Beyond the commons: Investigating the value of personalizing web search," in Proceedings of Workshop on New Technologies for Personalized Information Access (PIA '05), Edinburgh, Scotland, UK, 2005.
[217]
J. Teevan, S. T. Dumais, and E. Horvitz, "Personalizing search via automated analysis of interests and activities," in SIGIR '05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449-456, New York, NY, USA: ACM Press, 2005.
[218]
"The associated press: Internet ad revenue exceeds $21b in 2007," 2008. http://ap.google.com/article/ALeqM5hccYd6ZuXTns2RWXUgh6br4n1UoQ D8V1GGC00.
[219]
H. Turtle and J. Flood, "Query evaluation: Strategies and optimizations," Information Processing and Management, vol. 31, no. 6, pp. 831-850, 1995.
[220]
M. van Erp and L. Schomaker, "Variants of the borda count method for combining ranked classifier hypotheses," in Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, pp. 443-452, International Unipen Foundation, 2000.
[221]
C. J. van Rijsbergen, Information Retrieval. London: Butterworths, 2nd ed., 1979.
[222]
M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos, "Identifying similarities, periodicities and bursts for online search queries," in SIGMOD '04: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 131-142, New York, NY, USA: ACM, 2004.
[223]
M. Vlachos, P. S. Yu, V. Castelli, and C. Meek, "Structural periodic measures for time-series data," Data Mining and Knowledge Discovery, vol. 12, no. 1, pp. 1-28, 2006.
[224]
D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer, "Classifying search engine queries using the web as background knowledge," SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 117-122, 2005.
[225]
X. Wang and C. Zhai, "Learn from web search logs to organize search results," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 87-94, New York, NY, USA: ACM, 2007.
[226]
R. Weiss, B. Vélez, and M. A. Sheldon, "Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering," in HYPERTEXT '96: Proceedings of the the Seventh ACM Conference on Hypertext, pp. 180-193, New York, NY, USA: ACM, 1996.
[227]
R. W. White, M. Bilenko, and S. Cucerzan, "Studying the use of popular destinations to enhance web search interaction," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 159-166, New York, NY, USA: ACM, 2007.
[228]
R. W. White, M. Bilenko, and S. Cucerzan, "Leveraging popular destinations to enhance web search interaction," ACM Transactions on the Web, vol. 2, no. 3, pp. 1-30, 2008.
[229]
R. W. White and D. Morris, "Investigating the querying and browsing behavior of advanced search engine users," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 255-262, New York, NY, USA: ACM, 2007.
[230]
L. Xiong and E. Agichtein, "Towards privacy-preserving query log publishing," in Query Log Analysis: Social And Technological Challenges. A workshop at the 16th International World Wide Web Conference (WWW 2007), (E. Amitay, C. G. Murray, and J. Teevan, eds.), May 2007.
[231]
J. Xu and J. Callan, "Effective retrieval with distributed collections," in SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 112-120, New York, NY, USA: ACM, 1998.
[232]
J. Xu and W. B. Croft, "Cluster-based language models for distributed retrieval," in SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 254-261, New York, NY, USA: ACM, 1999.
[233]
J. Xu and W. B. Croft, "Improving the effectiveness of information retrieval with local context analysis," ACM Transactions on Information Systems, vol. 18, no. 1, pp. 79-112, 2000.
[234]
J. L. Xu and A. Spink, "Web research: The excite study," in WebNet 2000, pp. 581-585, 2000.
[235]
Yahoo! Grid, "Open source distributed computing: Yahoo's hadoop support," http://developer.yahoo.net/blog/archives/2007/07/yahoo-hadoop.html, 2007.
[236]
Y. Yang and C. G. Chute, "An example-based mapping method for text categorization and retrieval," ACM Transactions on Information Systems, vol. 12, no. 3, pp. 252-277, 1994.
[237]
Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A support vector method for optimizing average precision," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 271-278, New York, NY, USA: ACM, 2007.
[238]
B. Yuwono and D. L. Lee, "Server ranking for distributed text retrieval systems on the internet," in Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), pp. 41-50, World Scientific Press, 1997.
[239]
O. R. Zaïane and A. Strilets, "Finding similar queries to satisfy searches based on query traces," in OOIS Workshops, pp. 207-216, 2002.
[240]
J. Zhang and T. Suel, "Optimized inverted list assignment in distributed search engine architectures," in IPDPS, pp. 1-10, 2007.
[241]
Y. Zhang and A. Moffat, "Some observations on user search behavior," in Proceedings of the 11th Australasian Document Computing Symposium, Brisbane, Australia, 2006.
[242]
Z. Zhang and O. Nasraoui, "Mining search engine query logs for query recommendation," in WWW '06: Proceedings of the 15th international conference on World Wide Web, pp. 1039-1040, New York, NY, USA: ACM, 2006.
[243]
Q. Zhao, S. C. H. Hoi, T.-Y. Liu, S. S. Bhowmick, M. R. Lyu, and W.-Y. Ma, "Time-dependent semantic similarity measure of queries using historical click-through data," in WWW '06: Proceedings of the 15th international conference on World Wide Web, pp. 543-552, New York, NY, USA: ACM, 2006.
[244]
Z. Zheng, K. Chen, G. Sun, and H. Zha, "A regression framework for learning ranking functions using relative relevance judgments," in SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 287-294, New York, NY, USA: ACM, 2007.
[245]
G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, 1949.
[246]
J. Zobel and A. Moffat, "Inverted files for text search engines," ACM Computing Surveys, vol. 38, no. 2, p. 6, 2006.

Cited By

View all
  • (2023)On the self-adjustment of privacy safeguards for query log streamsComputers and Security10.1016/j.cose.2023.103450134:COnline publication date: 1-Nov-2023
  • (2022)A comparison of dataset search behaviour of internal versus search engine referred sessionsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505821(158-168)Online publication date: 14-Mar-2022
  • (2021)Indexing Highly Repetitive String Collections, Part IIACM Computing Surveys10.1145/343299954:2(1-32)Online publication date: 9-Feb-2021
  • Show More Cited By
  1. Mining Query Logs: Turning Search Usage Data into Knowledge

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Foundations and Trends in Information Retrieval
    Foundations and Trends in Information Retrieval  Volume 4, Issue 1—2
    January 2010
    176 pages
    ISSN:1554-0669
    EISSN:1554-0677
    Issue’s Table of Contents

    Publisher

    Now Publishers Inc.

    Hanover, MA, United States

    Publication History

    Published: 01 January 2010

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)On the self-adjustment of privacy safeguards for query log streamsComputers and Security10.1016/j.cose.2023.103450134:COnline publication date: 1-Nov-2023
    • (2022)A comparison of dataset search behaviour of internal versus search engine referred sessionsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505821(158-168)Online publication date: 14-Mar-2022
    • (2021)Indexing Highly Repetitive String Collections, Part IIACM Computing Surveys10.1145/343299954:2(1-32)Online publication date: 9-Feb-2021
    • (2021)CoNotate: Suggesting Queries Based on Notes Promotes Knowledge DiscoveryProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445618(1-14)Online publication date: 6-May-2021
    • (2021)Investigating Session Search Behavior with Knowledge GraphsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463107(1708-1712)Online publication date: 11-Jul-2021
    • (2021)Adaptive utterance rewriting for conversational searchInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10268258:6Online publication date: 1-Nov-2021
    • (2020)Personalized Entity Search by Sparse and Scrutable User ProfilesProceedings of the 2020 Conference on Human Information Interaction and Retrieval10.1145/3343413.3378011(427-431)Online publication date: 14-Mar-2020
    • (2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
    • (2019)What Should We Teach in Information Retrieval?ACM SIGIR Forum10.1145/3308774.330878052:2(19-39)Online publication date: 17-Jan-2019
    • (2019)Clarifying False Memories in Voice-based SearchProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298961(331-335)Online publication date: 8-Mar-2019
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media