Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1027527.1027747acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Hierarchical clustering of WWW image search results using visual, textual and link information

Published: 10 October 2004 Publication History

Abstract

We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users' browsing. In this paper, we propose a hierarchical clustering method using visual, textual and link analysis. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. By using block-level link analysis techniques, an image graph can be constructed. We then apply spectral techniques to find a Euclidean embedding of the images which respects the graph structure. Thus for each image, we have three kinds of representations, i.e. visual feature based representation, textual feature based representation and graph based representation. Using spectral clustering techniques, we can cluster the search results into different semantic clusters. An image search example illustrates the potential of these techniques.

References

[1]
AltaVista image search, http://www.altavista.com/image/
[2]
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman 1999.
[3]
M. Belkin and P. Niyogi, "Laplacian eigenmaps and spectral techniques for embedding and clustering", Advances in Neural Information Processing Systems 14, Canada, 2001.
[4]
S. Brin and L. Page, "The anatomy of a large-scale hypertextual (Web) search engine", In The Seventh International World Wide Web Conference, 1998.
[5]
D. Cai, X. He, W.-Y. Ma, J.-R. Wen and H.-J. Zhang. "Organizing WWW Images Based on The Analysis of Page Layout and Web Link Structure", in The 2004 IEEE International Conference on Multimedia and EXPO, 2004.
[6]
D. Cai, X. He, J.-R. Wen, and W.-Y. Ma, "Block-level Link Analysis", in The 27th Annual International ACM SIGIR Conference on Information Retrieval, 2004.
[7]
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "VIPS: a vision-based page segmentation algorithm", Microsoft Technical Report, MSR-TR-2003-79, 2003.
[8]
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, "Block-based Web Search", The 27th Annual International ACM SIGIR Conference on Information Retrieval, 2004.
[9]
S. Chakrabarti, "Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction", In the 10th International WWW Conference, 2001.
[10]
Y. Chen, J. Z. Wang, and R. Krovetz. "Content-based image retrieval by clustering". In Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval, pages 193--200. ACM Press, 2003.
[11]
C. Frankel, M. Swain, and V. Athitsos, "WebSeer: An image search engine for the world wide web", TR-96-14, Department of Computer Science, University of Chicago, 1996.
[12]
Google image search engine, http://images.google.com/
[13]
Google Zeitgeist - Search patterns, trends, and surprises according to Google, (2004) http://www.google.com/press/zeitgeist.html
[14]
S. Gordon, H. Greenspan, and J. Goldberger. "Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations". In ICCV, 2003.
[15]
X. He, D. Cai, J.-R. Wen, W.-Y. Ma and H.-J. Zhang, "ImageSeer: Clustering and Searching WWW Images Using Link and Page Layout Analysis", Microsoft Technical Report, MSR-TR-2004-38, 2004.
[16]
X. He, W.-Y. Ma, and H. J. Zhang, "ImageRank: spectral techniques for structural analysis of image database", IEEE International Conference on Multimedia and Expo, 2003.
[17]
J. Kleinberg, "Authoritative sources in a hyperlinked environment", Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
[18]
R. Lempel and A. Soffer, "PicASHOW: Pictorial authority search by hyperlinks on the web", Proc. 10th Int. World Wide Web Conf., pp. 438--448, Hong Kong, China, 2001.
[19]
A. V. Leouski and B. Croft, An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Computer Science Dept., University of Massachusetts, 1996.
[20]
B. S. Manjunath, W. -Y. Ma, "Texture Features for Browsing and Retrieval of Image Data", IEEE Trans on PAMI, Vol. 18, No. 8, pp. 837--842, 1996.
[21]
A. Y. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm", Advances in Neural Information Processing Systems 14, Vancouver, Canada, 2001.
[22]
K. Rodden, W. Basalaj, D. Sinclair, and K. R. Wood. Does organisation by similarity assist image browsing? In Proceedings of Human Factors in Computing Systems, 2001.
[23]
J. Shi and J. Malik, "Normalized cuts and image segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp. 888--905, 2000.
[24]
J. Smith and S.-F. Chang, "WebSEEK, a content-based image and video search and catalog tool for the web", IEEE Multimedia, 1997.
[25]
M. Stricker and M. Orengo, "Similarity of color images", Proc. Storage and Retrieval for Image and Video Databases,SPIE 2420, pp. 381--392, 1995.
[26]
S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma, "Improving pseudo-relevance feedback in web information retrieval using web page segmentation", Proc. 12th World Wide Web Conference, Budapest, Hungary, 2003.
[27]
H. Yu, M. Li, H.-J. Zhang, and J. Feng. Color texture moments for content-based image retrieval. In International Conference on Image Processing, pages 24--28. 2002.
[28]
Vivisimo clustering engine, (2004) http://vivisimo.com.
[29]
O. Zamir and O. Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results". In Proceedings of the Eighth International World Wide Web Conferenc?1999.

Cited By

View all
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11(10917-10929)Online publication date: Nov-2024
  • (2023)Generalized reductionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619116(17218-17242)Online publication date: 23-Jul-2023
  • (2023)Web Page Segmentation: A DOM-Structural Cohesion Analysis ApproachWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_25(319-333)Online publication date: 21-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia
October 2004
1028 pages
ISBN:1581138938
DOI:10.1145/1027527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph model
  2. image clustering
  3. link analysis
  4. search result organization
  5. spectral analysis
  6. vision based page segmentation
  7. web image search

Qualifiers

  • Article

Conference

MM04

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11(10917-10929)Online publication date: Nov-2024
  • (2023)Generalized reductionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619116(17218-17242)Online publication date: 23-Jul-2023
  • (2023)Web Page Segmentation: A DOM-Structural Cohesion Analysis ApproachWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_25(319-333)Online publication date: 21-Oct-2023
  • (2022)Multimodal Web Page Segmentation Using Self-organized Multi-objective ClusteringACM Transactions on Information Systems10.1145/348096640:3(1-49)Online publication date: 7-Mar-2022
  • (2022)Automated metadata extraction: challenges and opportunities2022 IEEE 18th International Conference on e-Science (e-Science)10.1109/eScience55777.2022.00088(495-500)Online publication date: Oct-2022
  • (2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-9Online publication date: 11-Feb-2022
  • (2022)Task-specific image summaries using semantic information and self-supervisionSoft Computing10.1007/s00500-021-06603-626:16(7581-7594)Online publication date: 21-Jan-2022
  • (2021)SymbolFinder: Brainstorming Diverse Symbols Using Local Semantic NetworksThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474757(385-399)Online publication date: 10-Oct-2021
  • (2021)Fast Communication-Efficient Spectral Clustering over Distributed DataIEEE Transactions on Big Data10.1109/TBDATA.2019.29079857:1(158-168)Online publication date: 1-Mar-2021
  • (2021)Analyzing the similarity of protein domains by clustering Molecular Surface MapsComputers and Graphics10.1016/j.cag.2021.06.00799:C(114-127)Online publication date: 1-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media