Abstract
We present the main algorithmic challenges that large Web search engines face today. These challenges are present in all the modules of a Web retrieval system, ranging from the gathering of the data to be indexed (crawling) to the selection and ordering of the answers to a query (searching and ranking). Most of the challenges are ultimately related to the quality of the answer or the efficiency in obtaining it, although some are relevant even to the existence of current search engines: context based advertising.
As the Web grows and changes at a fast pace, the algorithms behind these challenges must rely in large scale experimentation, both in data volume and computation time, to understand the main issues that affect them. We show examples of our own research and of the state of the art. The full version of this paper appears in [1] .
Similar content being viewed by others
References
Baeza-Yates, R.: Algorithmic Challenges in Web Search Engines. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 1–7. Springer, Heidelberg (2006)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, p. 513. Addison-Wesley, England (1999)
Baeza-Yates, R.: Information Retrieval in the Web: beyond current search engines, Int. Journal of Approximate Reasoning 34(2-3), 97–104 (2003)
Baeza-Yates, R., Castillo, C., Marin, M., Rodriguez, A.: Crawling a Country: Better Strategies than Breadth-First for Page Ordering. In: WWW 2005, Industrial Track, ACM Press, Chiba, Japan (2005)
Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Query Recommendation Using Query Logs in Search Engines. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)
Baeza-Yates, R.: A Fast Set Intersection Algorithm for Sorted Sequences. In: 15th Combinatorial Pattern Matching 2004, Turkey. LNCS, Springer, Istanbul, Turkey (2004)
Baeza-Yates, R.: Applications of Web Query Mining. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, Springer, Heidelberg (2005)
Baeza-Yates, R., Poblete, B.: A Website Mining Model Centered on User Queries. In: Berendt, B., et al. (eds.) European Web Mining Forum, Oporto, Portugal, October 2005, pp. 3–15 (2005)
Baeza-Yates, R., Pereira, A., Ziviani, N.: WIM: A Web Information Mining Model for the Web. In: LA-WEB 2005, pp. 233–241. IEEE CS Press, Los Alamitos (2005)
Bhargava, H.K., Feng, J.: Paid placement strategies for internet search engines. In: Proceedings of the eleventh international conference on World Wide Web, pp. 117–123. ACM Press, New York (2002)
Chakrabarti, S.: Mining the Web: Discovering knowledge from hypertext data. Morgan Kaufmann, San Francisco (2003)
Davison, B.: Workshop on Adversarial Information Retrieval on the Web, Chiba, Japan (May 2005), http://airweb.cse.lehigh.edu/2005/
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1998); Preliminary version presented at SODA 1998
Kleinberg, J., Raghavan, P.: Query Incentive Networks. In: Proc. 46th IEEE Symposium on Foundations of Computer Science (2005)
Koster, M.: A standard for robot exclusion (1996), http://www.robotstxt.org/wc/exclusion.html
Makinen, V., Navarro, G.: Compressed Full Text Indexes. Technical Report TR/DCC-, -7, Dept. of Computer Science, University of Chile (June 2005), Available at: http://pizzachili.dcc.uchile.cl/biblio.html
Nicholson, S., Sierra, T., Eseryel, U.Y., Park, J.H., Barkow, P., Pozo, E.J., Ward, J.: How Much of It is Real? Analysis of Paid Placement in Web Search Engine Results. In: JASIST (2005)
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Ribeiro-Neto, B., Cristo, M., Golgher, P., Silva de Moura, E.: Impedance coupling in content-targeted advertising. In: Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval, SIGIR 2005, Salvador, Brazil, August 15 - 19, 2005, pp. 496–503. ACM Press, New York (2005)
Wellman, B.: Computer Networks As Social Networks. Science 293(5537), 2031–2034 (2001)
Yao, A.C.-C. (ed.): WINE 2005. LNCS, vol. 3828. Springer, Heidelberg (2005), http://www.cs.cityu.edu.hk/~wine2005/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R. (2006). Algorithmic Challenges in Web Search Engines. In: Àlvarez, C., Serna, M. (eds) Experimental Algorithms. WEA 2006. Lecture Notes in Computer Science, vol 4007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11764298_25
Download citation
DOI: https://doi.org/10.1007/11764298_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34597-8
Online ISBN: 978-3-540-34598-5
eBook Packages: Computer ScienceComputer Science (R0)