research-article

A survey of Web clustering engines

Authors:

Claudio Carpineto,

Stanislaw Osiński,

Giovanni Romano,

Dawid WeissAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 41, Issue 3

Article No.: 17, Pages 1 - 38

https://doi.org/10.1145/1541880.1541884

Published: 30 July 2009 Publication History

Abstract

Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.

References

[1]

Abney, S. 1991. Parsing by Chunks. In Principle-Based Parsing: Computation and Psycholinguistics, R. C. Berwick, S. P. Abney, and C. Tenny, Eds. Kluwer Academic Publishers, 257--278.

[2]

Allen, R. B., Obry, P., and Littman, M. 1993. An interface for navigating clustered document sets returned by queries. In Proceedings of the ACM Conference on Organizational Computing Systems. ACM Press, 166--171.

Digital Library

[3]

Alonso, O. and Gertz, M. 2006. Clustering of search results using temporal attributes. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 597--598.

Digital Library

[4]

Anagnostopoulos, A., Broder, A., and Punera, K. 2006. Effective and efficient classification on a search-engine model. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM Press, 208--217.

Digital Library

[5]

Bade, K. and N&#252;rnberger, A. 2006. Personalized hierarchical clustering. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE, 181--187.

Digital Library

[6]

Broder, A. 2002. A taxonomy of Web search. ACM SIGIR Forum 36, 2, 3--10.

Digital Library

[7]

Carpineto, C., Della Pietra, A., Mizzaro, S., and Romano, G. 2006. Mobile Clustering Engine. In Proceedings of the 28th European Conference on Information Retrieval. Lecture Notes in Computer Science, vol. 3936. Springer, 155--166.

Digital Library

[8]

Carpineto, C., Mizzaro, S., Romano, G., and Snidero, M. 2009. Mobile information retrieval with search results clustering: Prototypes and evaluations. J. Amer. Soc. Inform. Sci. Tec. 60, 5, 877--895.

Digital Library

[9]

Carpineto, C. and Romano, G. 2004a. Concept Data Analysis: Theory and Applications. Wiley.

Digital Library

[10]

Carpineto, C. and Romano, G. 2004b. Exploiting the potential of concept lattices for information retrieval with CREDO. J. Univ. Comput. Sci. 10, 8, 985--1013.

[11]

Carpineto, C., Romano, G., and Giannini, V. 2002. Improving retrieval feedback with multiple term-ranking function combinations. ACM Trans. Inform. Syst. 20, 3, 259--290.

Digital Library

[12]

Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Inter. J. VLDB, 7, 3, 163--178.

Digital Library

[13]

Chen, H. and Dumais, S. 2000. Bringing order to the Web: Automatically categorizing search results. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, 145--152.

Digital Library

[14]

Cheng, D., Vempala, S., Kannan, R., and Wang, G. 2005. A divide-and-merge methodology for clustering. In Proceedings of the 24th ACM Symposium on Principles of Database Systems, C. Li, Ed. ACM Press, 196--205.

Digital Library

[15]

Cigarr&#225;n, J., Pe&#241;as, A., Gonzalo, J., and Verdejo, F. 2005. Evaluating hierarchical clustering of search results. In Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE). Springer, 49--54.

Digital Library

[16]

Cole, R., Eklund, P., and Stumme, G. 2003. Document retrieval for email search and discovery using formal concept analysis. Appl. Artif. Intell. 17, 3, 257--280.

[17]

Cutting, D. R., Pedersen, J. O., Karger, D., and Tukey, J. W. 1992. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 318--329.

Digital Library

[18]

De Luca, E. W. and N&#252;rnberger, A. 2005. Supporting information retrieval on mobile devices. In Proceedings of the 7th International Conference on Human Computer Interaction with Mobile Devices and Services (MobileHCI'05). ACM Press, New York, NY, 347--348.

Digital Library

[19]

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, T. K. 1990. Indexing by latent semanic analysis. J. Amer. Soc. Inform. Sci. 41, 6, 391--407.

[20]

Di Giacomo, E., Didimo, W., Grilli, L., and Liotta, G. 2007. Graph visualization techniques for Web clustering engines. IEEE Trans. Visual. Comput. Graph. 13, 2, 294--304.

Digital Library

[21]

Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng, Y., Reddivari, P., Doshi, V., and Sachs, J. 2004. Swoogle: a search and metadata engine for the semantic web. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM Press, 652--659.

Digital Library

[22]

Dom, B. E. 2001. An information-theoretic external cluster-validity measure. Tech. rep. RJ-10219, IBM.

[23]

Dong, Z. 2002. Towards Web Information Clustering. Ph.D. thesis, Southeast University, Nanjing, China.

[24]

Eades, P. and Tamassia, R. 1989. Algorithms for drawing graphs: an annotated bibliography. Tech. rep. CS-89-90, Department of Computer Science, Brown University.

Digital Library

[25]

Estivill-Castro, V. 2002. Why so many clustering algorithms: A position paper. SIGKDD Explor. 4, 1, 65--75.

Digital Library

[26]

Everitt, B. S., Landau, S., and Leese, M. 2001. Cluster Analysis, 4th Ed. Oxford University Press.

Digital Library

[27]

Ferragina, P. and Gulli, A. 2004. The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 3202. Springer, 506--508.

Digital Library

[28]

Ferragina, P. and Gulli, A. 2005. A personalized search engine-based on Web-snippet hierarchical clustering. In Proceedings of the 14th International Conference on World Wide Web. ACM Press, 801--810.

Digital Library

[29]

Fred, A. L. N. and Jain, A. K. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Patt. Anal. Mach. Intell. 27, 6, 835--850.

Digital Library

[30]

Gabrilovich, E. 2006. Feature generation for textual information retrieval using world knowledge. Ph.D. thesis, Technion&#8212;Israel Institute of Technology, Haifa, Israel.

[31]

Ganter, B. and Wille, R. 1999. Formal Concept Analysis: Mathematical Foundations. Springer.

Digital Library

[32]

Geraci, F., Maggini, M., Pellegrini, M., and Sebastiani, F. 2008. Cluster generation and cluster labelling for Web snippets: A fast and accurate hierarchical solution. Internet Math. 3, 4, 413--443.

[33]

Giannotti, F., Nanni, M., Pedreschi, D., and Samaritani, F. 2003. WebCat: Automatic categorization of Web search results. In Proceedings of the 11th Italian Symposium on Advanced Database Systems (SEBD), S. Flesca, S. Greco, D. Sacc&#224;, and E. Zumpano, Eds. Rubettino Editore, 507--518.

[34]

Grefenstette, G. 1995. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data (JADT'95). 263--268.

[35]

Haase, P., Hotho, A., Schmidt-Thieme, L., and Sure, Y. 2005. Collaborative and usage-driven evolution of personal ontologies. In Proceedings of the 2nd European Semantic Web Conference. Springer, 486--499.

Digital Library

[36]

Halkidi, M., Batistakis, Y., and Vazirgiannis, M. 2001. On clustering validation techniques. J. Intell. Inform. Syst. 17, 2--3, 107--145.

Digital Library

[37]

Harabagiu, S. and Lacatusu, F. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 202--209.

Digital Library

[38]

Hartigan, J. A. 1975. Clustering Algorithms. Wiley.

Digital Library

[39]

Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, K.-P. 2002. Finding the flow in Web site search. Comm. ACM (Special Issue: the Consumer Side of Search) 45, 9, 42--49.

Digital Library

[40]

Hearst, M. A. 2006. Clustering versus faceted categories for information exploration. Comm. ACM 49, 4, 59--61.

Digital Library

[41]

Hearst, M. A. and Pedersen, J. O. 1996. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval. ACM Press, 76--84.

Digital Library

[42]

Herman, I., Melancon, G., and Marshall, S. M. 2000. Graph visualization and navigation in information visualization: A survey. IEEE Trans. Visual. Comput. Graph. 6, 10, 1--21.

Digital Library

[43]

Hotho, A., Staab, S., and Stumme, G. 2003. Explaining text clustering results using semantic structures. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 2838. Springer, 217--228.

[44]

Husek, D., Pokorny, J., Rezankova, H., and Snasel, V. 2006. Data clustering: From documents to the Web. In Web Data Management Practices: Emerging Techniques and Technologies, A. Vakali and G. Pallis, Eds. Baker and Taylor, 1--33.

[45]

Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall.

Digital Library

[46]

Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data Clustering: A Review. ACM Comput. Surv. 31, 3, 264--323.

Digital Library

[47]

Johnson, B. and Shneiderman, B. 1991. Treemaps: A space-filling approach to the visualization of hierarchical information structures. In Proceedings of IEEE Visualization. IEEE Computer Society, San Diego, 284--291.

Digital Library

[48]

K&#228;ki, M. 2005. Findex: Search result categories help users when document ranking fails. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'05). ACM Press, 131--140.

Digital Library

[49]

Kantrowitz, M., Mohit, B., and Mittal, V. 2000. Stemming and its effects on TFIDF ranking. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval. ACM Press, 357--359.

Digital Library

[50]

Karlson, A. K., Robertson, G. G., Robbins, D. C., Czerwinski, M., and Smith, G. 2006. FaThumb: A facet-based interface for mobile search. In Proceedings of the CHI Conference on Human Factors in Computing Systems. ACM Press, 711--720.

Digital Library

[51]

Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., and Giannopoulou, E. 2007. Ontology visualization methods a survey. ACM Comput. Surv. 39, 4, 1--43.

Digital Library

[52]

Kittler, J., Hatef, M., Duin, R. P., and Matas, J. 1998. On Combining Classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 20, 3, 226--239.

Digital Library

[53]

Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., and Krishnapuram, R. 2004. A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In Proceedings of the 13th International Conference on World Wide Web. ACM Press, 658--665.

Digital Library

[54]

Lawrie, D. J. and Croft, B. W. 2003. Generating hiearchical summaries for Web searches. In Proceedings of the 26th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 457--458.

Digital Library

[55]

Lawrie, D. J., Croft, B. W., and Rosenberg, A. 2001. Finding topic words for hierarchical summarization. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 349--357.

Digital Library

[56]

Leuski, A. 2001. Evaluating document clustering for interactive information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM Press, 33--40.

Digital Library

[57]

Leuski, A. and Croft, B. W. 1996. An evaluation of techniques for clustering search results. Tech. rep. IR-76, University of Massachusetts, Amherst.

[58]

Lin, D. and Pantel, P. 2002. Concept discovery from text. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 1--7.

Digital Library

[59]

Liu, T., Liu, S., Chen, Z., and Ma, W.-Y. 2003. An evaluation on feature selection for text clustering. In Proceedings of the 20th International Conference on Machine Learning, August 21--24, T. Fawcett and N. Mishra, Eds. AAAI Press, 488--495.

[60]

Liu, X. and Croft, B. W. 2004. Cluster-based retrieval using language models. In Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 186--193.

Digital Library

[61]

Liu, X. and Croft, B. W. 2006. Representing clusters for retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 671--672.

Digital Library

[62]

Maarek, Y. S., Fagin, R., Ben-Shaul, I. Z., and Pelleg, D. 2000. Ephemeral document clustering for Web applications. Tech. rep. RJ 10186, IBM Research.

[63]

Manber, U. and Myers, G. 1993. Suffix Arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948.

Digital Library

[64]

Manning, C. D., Raghavan, P., and Sch&#252;tze, H. 2008. Introduction to Information Retrieval. Cambridge University Press.

Digital Library

[65]

Manning, C. D. and Sch&#252;tze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.

Digital Library

[66]

Maslowska, I. 2003. Phrase-based hierarchical clustering of Web search results. In Proceedings of the 25th European Conference on IR Research, (ECIR). Lecture Notes in Computer Science, vol. 2633. Springer, 555--562.

Digital Library

[67]

Meng, W., Yu, C., and Liu, K.-L. 2002. Building efficient and effective metasearch engines. ACM Comput. Surv. 34, 1, 48--89.

Digital Library

[68]

Ngo, C. L. and Nguyen, H. S. 2004. A tolerance rough set approach to clustering Web search results. In Proceedings of the Knowledge Discovery in Databases: PKDD. Lecture Notes in Computer Science, vol. 3202. Springer, 515--517.

Digital Library

[69]

Osdin, R., Ounis, I., and White, R. W. 2002. Using hierarchical clustering and summarisation approaches for Web retrieval. In Proceedings of the 11th Text REtrieval Conference (TREC). National Institute of Standards and Technology (NIST).

[70]

Osi&#324;ski, S. 2006. Improving quality of search results clustering with approximate matrix factorisations. In Proceedings of the 28th European Conference on Information Retrieval. Lecture Notes in Computer Science, vol. 3936. Springer, 167--178.

Digital Library

[71]

Osi&#324;ski, S., Stefanowski, J., and Weiss, D. 2004. Lingo: Search results clustering algorithm based on singular value decomposition. In Proceedings of the International Intelligent Information Processing and Web Mining Conference. Advances in Soft Computing. Springer, 359--368.

[72]

Osi&#324;ski, S. and Weiss, D. 2005. A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20, 3, 48--54.

Digital Library

[73]

Otterbacher, J., Radev, D. R., and Kareem, O. 2006. News to go: hierarchical text summarization for mobile devices. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 589--596.

Digital Library

[74]

Pantel, P. and Lin, D. 2002. Document Clustering With Committees. In Proceedings of the 25th ACM International Conference on Research and Development in Information Retrieval. ACM Press, 199--206.

Digital Library

[75]

Perugini, S. and Ramakrishnan, N. 2006. Interacting with Web hierarchies. IT Profess. 8, 4, 19--28.

Digital Library

[76]

Pierrakos, D. and Paliouras, G. 2005. Exploiting probabilistic latent information for the construction of community Web directories. In Proceedings of the 10th International Conference on User Modeling. Springer, 89--98.

Digital Library

[77]

Porter, M. F. 1997. An algorithm for suffix stripping. In Readings in Information Retrieval, K. S. Jones and P. Willett, Eds. Morgan Kaufmann, 313--316.

Digital Library

[78]

Rigou, M., Sirmakessis, S., and Tzimas, G. 2006. A method for personalized clustering in data intensive Web applications. In Proceedings of the Joint International Workshop on Adaptivity, Personalization and the Semantic Web, (APS). ACM Press, 35--40.

Digital Library

[79]

Rivadeneira, W. and Bederson, B. B. 2003. A study of search result clustering interfaces: Comparing textual and zoomable user interfaces. Tech. rep. HCIL-TR-2003-36, University of Maryland.

[80]

Roberts, J. C. 1998. On encouraging multiple views for visualization. In Proceedings of IEEE Symposium on InfoVis. IEEE Computer Society, 8--14.

Digital Library

[81]

Rodden, K., Basalaj, W., Sinclair, D., and Wood, K. R. 2001. Does organisation by similarity assist image browsing&quest; In Proceedings of the CHI Conference on Human Factors in Computing Systems. ACM Press, 190--197.

Digital Library

[82]

Rose, D. E. and Levinson, D. 2004. Understanding user goals in Web search. In Proceedings of the 13th International Conference on World Wide Web. ACM Press, 13--19.

Digital Library

[83]

Sahoo, N., Callan, J., Krishnan, R., Duncan, G., and Padman, R. 2006. Incremental hierarchical clustering of text documents. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 357--366.

Digital Library

[84]

Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11, 613--620.

Digital Library

[85]

Sarkar, M. and Brown, M. 1994. Graphical fisheye views. Comm. ACM 37, 12, 73--84.

Digital Library

[86]

Sch&#252;tze, H. and Silverstein, C. 1997. Projections for efficient document clustering. In Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval. ACM Press, 74--81.

Digital Library

[87]

Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47.

Digital Library

[88]

Spiliopoulou, M., Schaal, M., M&#252;ller, R. M., and Brunzel, M. 2005. Evaluation of ontology enhancement tools. In Proceedings of the Semantics, Web and Mining, Joint International Workshops, EWMF and KDO. Lecture Notes in Computer Science, vol. 4289. Springer, 132--146.

Digital Library

[89]

Stasko, J. and Zhang, E. 2000. Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In Proceedings of IEEE Symposium on InfoVis. IEEE Computer Society, 57--65.

Digital Library

[90]

Stefanowski, J. and Weiss, D. 2003a. Carrot² and language properties in Web search results clustering. In Proceedings of the 1st International Atlantic Web Intelligence Conference. Lecture Notes in Computer Science, vol. 2663. Springer, 240--249.

Digital Library

[91]

Stefanowski, J. and Weiss, D. 2003b. Web search results clustering in Polish: Experimental Evaluation of Carrot. In Proceedings of the International Intelligent Information Processing and Web Mining Conference. Advances in Soft Computing. Springer, 209--218.

[92]

Stein, B. and Meyer zu Eissen, S. 2004. Topic identification: Framework and application. In Proceedings of the 4th International Conference on Knowledge Management. 353--360.

[93]

Stein, B., Meyer zu Eissen, S., and Wibrock, F. 2003. On cluster validity and the information need of users. In Proceedings of the 3rd IASTED International Conference on Artificial Intelligence and Applications (AIA). Springer, 216--221.

[94]

Tagarelli, A. and Greco, S. 2006. Toward semantic XML clustering. In Proceedings of the 6th SIAM International Conference on Data Mining (SDM). 188--199.

[95]

Teevan, J., Alvarado, C., Ackerman, M. S., and Karger, D. K. 2004. The perfect search engine is not enough: a study of orienteering behavior in directed search. In Proceedings of the CHI Conference on Human Factors in Computing Systems. ACM Press, 415--422.

Digital Library

[96]

Toda, H. and Kataoka, R. 2005. A search result clustering method using informatively named entities. In Proceedings of the 7th ACM International Workshop on Web Information and Data Management (WIDM). ACM Press, 81--86.

Digital Library

[97]

Tombros, A., Villa, R., and van Rijsbergen, K. 2002. The effectiveness of query-specific hierarchic clustering in information retrieval. Inform. Proc. Manag. 38, 4, 559--582.

Digital Library

[98]

Turetken, O. and Sharda, R. 2005. Clustering-based visual interfaces for presentation of Web search results: An empirical investigation. Inform. Syst. Front. 7, 3, 273--297.

Digital Library

[99]

Ukkonen, E. 1995. On-line construction of suffix trees. Algorithmica 14, 3, 249--260.

Digital Library

[100]

van Rijsbergen, K. 1979. Information Retrieval. Butterworth-Heinemann.

Digital Library

[101]

Wagstaff, K., Cardie, C., Rogers, S., and Scr&#246;dl, S. 2001. Constrained K-means clustering with background knowledge. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann, 577--584.

Digital Library

[102]

Wang, X. and Zhai, C. 2007. Learn from Web search logs to organize search results. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 87--94.

Digital Library

[103]

Wang, Y. and Kitsuregawa, M. 2002. On combining link and contents information for Web page clustering. In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA). Springer, 902--913.

Digital Library

[104]

Weiss, D. 2006. Descriptive clustering as a method for exploring text collections. Ph.D. thesis, Poznan University of Technology, Pozna&#324;, Poland.

[105]

Willet, P. 1988. Recent trends in hierarchic document clustering: A critical review. Inform. Proc. Manag. 24, 5, 577--597.

Digital Library

[106]

Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICMC). Morgan Kaufmann, San Francisco, 412--420.

Digital Library

[107]

Ye, S., Chua, T.-S., and Kei, J. R. 2003. Querying and clustering Web pages about persons and organizations. In Proceedings of the IEEE/WIC International Conference on Web Intelligence. Springer, 344--350.

Digital Library

[108]

Zamir, O. and Etzioni, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 46--54.

Digital Library

[109]

Zamir, O. and Etzioni, O. 1999. Grouper: A dynamic clustering interface to Web search results. Comput. Netw. 31, 11--16, 1361--1374.

Digital Library

[110]

Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., and Ma, J. 2004. Learning to cluster Web search results. In Proceedings of the 27th ACM International Conference on Research and Development in Information Retrieval. ACM Press, 210--217.

Digital Library

[111]

Zhang, D. and Dong, Y. 2004. Semantic, hierarchical, online clustering of Web search results. In Proceedings of 6th Asia-Pacific Web Conference (APWeb). Lecture Notes in Computer Science, vol. 3007. Springer, 69--78.

[112]

Zhang, Y.-J. and Liu, Z.-Q. 2004. Refining Web search engine results using incremental clustering. Int. J. Intell. Syst. 19, 191--199.

Digital Library

[113]

Zhao, H., Meng, W., Wu, Z., Raghavan, V., and Yu, C. 2005. Fully automatic wrapper generation for search engines. In Proceedings of the 14th International Conference on World Wide Web. ACM Press, 66--75.

Digital Library

Cited By

Silva LMachado LEmmendorfer L(2024)A Case and Cluster-Based Framework for Reuse and Prioritization in Software TestingProceedings of the 20th Brazilian Symposium on Information Systems10.1145/3658271.3658312(1-10)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3658271.3658312
Taieb-Maimon MHarush H(2024)Web Search Engine Results Page Viewing Formats for Different Search TasksInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2376358(1-16)Online publication date: 29-Jul-2024
https://doi.org/10.1080/10447318.2024.2376358
Guo MZhou ZGotz DWang Y(2023)GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory SearchACM Transactions on Interactive Intelligent Systems10.1145/358831913:2(1-36)Online publication date: 31-Mar-2023
https://dl.acm.org/doi/10.1145/3588319
Show More Cited By

Index Terms

A survey of Web clustering engines
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Web searcher interaction with the Dogpile.com metasearch engine

Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of ...
Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV

This study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...
Full discrimination of subtopics in search results with keyphrase-based clustering

We consider the problem of retrieving multiple documents relevant to the single subtopics of a given web query, termed “full-subtopic retrieval”. To solve this problem we present a novel search results clustering algorithm that generates clusters ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 41, Issue 3

July 2009

284 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/1541880

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2009

Accepted: 01 August 2008

Revised: 01 May 2008

Received: 01 December 2007

Published in CSUR Volume 41, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

301
Total Citations
View Citations
6,761
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Silva LMachado LEmmendorfer L(2024)A Case and Cluster-Based Framework for Reuse and Prioritization in Software TestingProceedings of the 20th Brazilian Symposium on Information Systems10.1145/3658271.3658312(1-10)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3658271.3658312
Taieb-Maimon MHarush H(2024)Web Search Engine Results Page Viewing Formats for Different Search TasksInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2376358(1-16)Online publication date: 29-Jul-2024
https://doi.org/10.1080/10447318.2024.2376358
Guo MZhou ZGotz DWang Y(2023)GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory SearchACM Transactions on Interactive Intelligent Systems10.1145/358831913:2(1-36)Online publication date: 31-Mar-2023
https://dl.acm.org/doi/10.1145/3588319
Pandey SLahoti S(2023)Improve Firefly Heuristic Optimization Scheme for Web based Information Retrieval2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS57338.2023.10100016(1-5)Online publication date: 24-Feb-2023
https://doi.org/10.1109/ICICACS57338.2023.10100016
Razavian MPaech BTang A(2023)The vision of on-demand architectural knowledge systems as a decision-making companionJournal of Systems and Software10.1016/j.jss.2022.111560198:COnline publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.jss.2022.111560
Alzboon MAlqaraleh MAljarrah EAlomari S(2022)Semantic Image Analysis on Social Networks and Data ProcessingHandbook of Research on Foundations and Applications of Intelligent Business Analytics10.4018/978-1-7998-9016-4.ch009(189-214)Online publication date: 2022
https://doi.org/10.4018/978-1-7998-9016-4.ch009
Faúndez Mde la Fuente-Mella H(2022)Data Analysis and Domain Knowledge for Strategic Competencies Using Business Intelligence and AnalyticsMathematics10.3390/math1101003411:1(34)Online publication date: 22-Dec-2022
https://doi.org/10.3390/math11010034
Ali Raza SAbbas SM. Ghazal TAdnan Khan MAhmad MAl Hamadi H(2022)Content Based Automated File Organization Using Machine Learning燗pproachesComputers, Materials & Continua10.32604/cmc.2022.02940073:1(1927-1942)Online publication date: 2022
https://doi.org/10.32604/cmc.2022.029400
Knittel JKoch STang TChen WWu YLiu SErtl T(2022)Real-Time Visual Analysis of High-Volume Social Media PostsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311480028:1(879-889)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TVCG.2021.3114800
Rouhani SMozaffari F(2022)Sentiment analysis researches story narrated by topic modeling approachSocial Sciences & Humanities Open10.1016/j.ssaho.2022.1003096:1(100309)Online publication date: 2022
https://doi.org/10.1016/j.ssaho.2022.100309
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents