Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1646396.1646452acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
poster

NUS-WIDE: a real-world web image database from National University of Singapore

Published: 08 July 2009 Publication History

Abstract

This paper introduces a web image dataset created by NUS's Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments extracted over 5x5 fixed grid partitions, and 500-D bag of words based on SIFT descriptions; and (3) ground-truth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval.

References

[1]
S. Arya, D. M. Mount, N. S. N. R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Journal of ACM, 45: 891--923, 1998.
[2]
K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3: 1107--1135, 2003.
[3]
F. Blog. http://blog.flickr.net/en/2007/05/29/were-going-down/.
[4]
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision, 2004.
[5]
A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia, 9(5): 958--966, 2007.
[6]
J. Huang, S. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih. Image indexing using color correlogram. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 762--768, June 1997.
[7]
D. Lowe. Distinctive image features from scale-invariant keypoints. Int'l J. Computer Vision, 2(60): 91--110, 2004.
[8]
Y. Lu, L. Zhang, Q. Tian, and W.-Y. Ma. What are the high-level concepts with small semantic gaps? In IEEE Conf. on Computer Vision and Pattern Recognition, 2008.
[9]
B. S. Manjunath and W.-Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8): 837--842, August 1996.
[10]
M. Naphade, J. R. Smith, J. Tesic, S. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. A large-scale concept ontology for multimedia. IEEE MultiMedia, 13: 86--91, July 2006.
[11]
D. K. Park, Y. S. Jeon, and C. S. Won. Efficient use of local edge histogram descriptor. In ACM Multimedia, 2000.
[12]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM Multimedia, 2007.
[13]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, and H.-J. Zhang. Two-dimensional multi-label active learning with an efficient online adaptation model for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear.
[14]
L. G. Shapiro and G. C. Stockman. Computer Vision. Prentice Hall, 2003.
[15]
C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In ACM Multimedia, Oct. 2006.
[16]
M. Stricker and M. Orengo. Similarity of color images. In SPIE Storage and Retrieval for Image and Video Databases III, Feb. 1995.
[17]
J. Tang, X.-S. Hua, M. Wang, Z. Gu, G.-J. Qi, and X. Wu. Correlative linear neighborhood propagation for video annotation. IEEE Transactions on Systems, Man, and Cybernetics--Part B: Cybernetics, 39(2), April 2009.
[18]
J. Tang, Y. Song, X.-S. Hua, T. Mei, and X. Wu. To construct optimal training set for video annotation. In ACM Multimedia, Oct. 2006.
[19]
A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11): 1958--1970, November 2008.
[20]
X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11): 1919--1932, November 2008.

Cited By

View all
  • (2025)Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346713047:1(387-399)Online publication date: Jan-2025
  • (2025)Angular Reconstructive Discrete Embedding With Fusion Similarity for Multi-View ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348790737:1(45-59)Online publication date: Jan-2025
  • (2025)Online weighted hashing for cross-modal retrievalPattern Recognition10.1016/j.patcog.2024.111232161(111232)Online publication date: May-2025
  • Show More Cited By

Index Terms

  1. NUS-WIDE: a real-world web image database from National University of Singapore

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval
      July 2009
      383 pages
      ISBN:9781605584805
      DOI:10.1145/1646396
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 July 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Flickr
      2. annotation
      3. retrieval
      4. tag refinement
      5. training set construction
      6. web image

      Qualifiers

      • Poster

      Conference

      CIVR '09
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)311
      • Downloads (Last 6 weeks)56
      Reflects downloads up to 22 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346713047:1(387-399)Online publication date: Jan-2025
      • (2025)Angular Reconstructive Discrete Embedding With Fusion Similarity for Multi-View ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348790737:1(45-59)Online publication date: Jan-2025
      • (2025)Online weighted hashing for cross-modal retrievalPattern Recognition10.1016/j.patcog.2024.111232161(111232)Online publication date: May-2025
      • (2025)Modality-specific adaptive scaling and attention network for cross-modal retrievalNeurocomputing10.1016/j.neucom.2024.128664612(128664)Online publication date: Jan-2025
      • (2025)Multi-view clustering with adaptive anchor and bipartite graph learningNeurocomputing10.1016/j.neucom.2024.128627611(128627)Online publication date: Jan-2025
      • (2025)Global–local prompts guided image-text embedding, alignment and aggregation for multi-label zero-shot learningJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104347106(104347)Online publication date: Feb-2025
      • (2025)Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrievalInformation Processing & Management10.1016/j.ipm.2024.10395862:2(103958)Online publication date: Mar-2025
      • (2025)Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlookInformation Fusion10.1016/j.inffus.2024.102606113(102606)Online publication date: Jan-2025
      • (2025)Scalable sparse bipartite graph factorization for multi-view clusteringExpert Systems with Applications10.1016/j.eswa.2024.126192267(126192)Online publication date: Apr-2025
      • (2024)Semi-Supervised Learning with Close-Form Label Propagation Using a Bipartite GraphSymmetry10.3390/sym1610131216:10(1312)Online publication date: 4-Oct-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media