research-article

Open access

YFCC100M: the new data in multimedia research

Authors:

David A. Shamma,

Gerald Friedland,

Benjamin Elizalde,

Douglas Poland,

Li-Jia LiAuthors Info & Claims

Communications of the ACM, Volume 59, Issue 2

Pages 64 - 73

https://doi.org/10.1145/2812802

Published: 25 January 2016 Publication History

All formats PDF

Abstract

This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.

References

[1]

Bernd, J., Borth, D., Elizalde, B., Friedland, G., Gallagher, H., Gottlieb, L.R., Janin, A., Karabashlieva, S., Takahashi, J., and Won, J. The YLI-MED corpus: Characteristics, procedures, and plans. Computing Research Repository Division of arXiv abs/1503.04250 (Mar. 2015).

[2]

Borgman, C.L. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology 63, 6 (Apr. 2012), 1059--1078.

Digital Library

[3]

Choi, J., Thomee, B., Friedland, G., Cao, L., Ni, K., Borth, D., Elizalde, B., Gottlieb, L., Carrano, C., Pearce, R., and Poland, D. The placing task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the Third ACM International Workshop on Geotagging and Its Applications in Multimedia (Orlando, FL, Nov. 3--7). ACM Press, New York, 2014, 27--31.

Digital Library

[4]

Crandall, D. J., Backstrom, L., Huttenlocher, D., and Kleinberg, J. Mapping the world's photos. In Proceedings of the 18^th IW3C2 International Conference on the World Wide Web (Madrid, Spain, Apr. 20--24). ACM Press, New York, 2009, 761--770.

Digital Library

[5]

Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Miami, FL, June 20--25). IEEE Press, New York, 2009. 248--255.

[6]

Facebook, Ericsson, and Qualcomm. A Focus on Efficiency. Technical Report, Internet.org, 2013; https://web.archive.org/web/20150402101302/http://internet.org/efficiencypaper

[7]

Fienberg, S.E., Martin, M.E., and Straf, M.L. Eds. (National Research Council). Sharing Research Data. National Academy Press, Washington, D.C., 1985; http://www.nap.edu/catalog/2033/sharing-research-data

[8]

Good, J. How many photos have ever been taken?. Internet Archive Wayback Machine, Sept. 2011; https://web.archive.org/web/20150203215607/http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox

[9]

Hays, J. and Efros, A.A. IM2GPS: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Anchorage, AK, June 23--28). IEEE Press, New York, 2008.

[10]

Hecht, B., Hong, L., Suh, B., and Chi, E. H. Tweets from Justin Bieber's heart: The dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, Canada, May 7--12). ACM Press, New York, 2011, 237--246.

Digital Library

[11]

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R. B., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22^nd ACM International Conference on Multimedia (Orlando, FL, Nov. 3--7). ACM Press, New York, 2014, 675--678.

Digital Library

[12]

Kremerskothen, K. Welcome the Internet archive to the commons. Flickr, San Francisco, CA, Aug. 2014; https://blog.flickr.net/2014/08/29/welcome-the-internet-archive-to-the-commons/

[13]

Krizhevsky, A., Sutskever, I., and Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, CA, Dec 3--8). Curran Associates, Red Hook, NY, 2012, 1097--1105.

[14]

Li, L., Socher, R., and Fei-Fei, L. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Miami, FL, June 20--25). IEEE Press, New York, 2009, 2036--2043.

[15]

Rattenbury, T., Good, N., and Naaman, M. Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of the 30^th ACM International Conference on Research and Development in Information Retrieval (Amsterdam, the Netherlands, July 23--27). ACM Press, New York, 2007, 103--110.

Digital Library

[16]

Renear, A.H., Sacchi, S., and Wickett, K.M. Definitions of dataset in the scientific and technical literature. In Proceedings of the 73^rd Annual Meeting of the American Society for Information Science and Technology (Pittsburgh, PA, Oct. 22--27). Association for Information Science and Technology, Silver Spring, MD, 2010, article 81.

Digital Library

[17]

Snavely, N., Seitz, S., and Szeliski, R. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics 25, 3 (July 2006), 835--846.

Digital Library

[18]

Swan, A. and Brown, S. To Share or Not to Share: Publication and Quality Assurance of Research Data Outputs. Technical Report. Research Information Network, London, U.K., 2008.

[19]

Van Dijck, J. Digital photography: Communication, identity, memory. Visual Communication 7, 1 (Feb. 2008), 57--76.

[20]

Wilson, M.L., Chi, E.H., Reeves, S., and Coyle, D. RepliCHI: The workshop II. In Proceedings of the International Conference on Human Factors in Computing Systems, Extended Abstracts (Toronto, Canada, Apr. 26--May 1). ACM Press, New York, 2014, 33--36.

Digital Library

[21]

Yelp. Yelp Dataset Challenge. Yelp, San Francisco, CA; http://yelp.com/dataset_challenge/

[22]

YouTube. YouTube press statistics. YouTube, San Bruno, CA; http://youtube.com/yt/press/statistics.html

Cited By

Sumsion ATorrie SLee DSun Z(2024)Surveying Racial Bias in Facial Recognition: Balancing Datasets and Algorithmic EnhancementsElectronics10.3390/electronics1312231713:12(2317)Online publication date: 13-Jun-2024
https://doi.org/10.3390/electronics13122317
Chen WMiao LGui JWang YLi Y(2024)FLsM: Fuzzy Localization of Image Scenes Based on Large ModelsElectronics10.3390/electronics1311210613:11(2106)Online publication date: 29-May-2024
https://doi.org/10.3390/electronics13112106
Waqas ATripathi ARamachandran RStewart PRasool G(2024)Multimodal data integration for oncology in the era of deep neural networks: a reviewFrontiers in Artificial Intelligence10.3389/frai.2024.14088437Online publication date: 25-Jul-2024
https://doi.org/10.3389/frai.2024.1408843
Show More Cited By

Index Terms

YFCC100M: the new data in multimedia research
1. Information systems
  1. Data management systems
    1. Information integration
  2. Information systems applications
    1. Data mining

Recommendations

Real-time Analysis and Visualization of the YFCC100m Dataset
MMCommons '15: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions

With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has ...
Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset
MMCommons '16: Proceedings of the 2016 ACM Workshop on Multimedia COMMONS

The Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M) is one of the largest public databases containing images and videos and their annotations for research on multimedia analysis. In this paper, we present our study on analysis of ...
Practical guide to using the YFCC100M and MMCOMMONS on a budget

The Yahoo-Flickr Creative Commons 100 Million (YFCC100M), the largest freely usable multimedia dataset to have been released so far, is widely used by students, researchers and engineers on topics in multimedia that range from computer vision to machine ...

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM

Communications of the ACM Volume 59, Issue 2

February 2016

110 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/2886013

Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 January 2016

Published in CACM Volume 59, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,014
Total Citations
View Citations
39,442
Total Downloads

Downloads (Last 12 months)2,074
Downloads (Last 6 weeks)163

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sumsion ATorrie SLee DSun Z(2024)Surveying Racial Bias in Facial Recognition: Balancing Datasets and Algorithmic EnhancementsElectronics10.3390/electronics1312231713:12(2317)Online publication date: 13-Jun-2024
https://doi.org/10.3390/electronics13122317
Chen WMiao LGui JWang YLi Y(2024)FLsM: Fuzzy Localization of Image Scenes Based on Large ModelsElectronics10.3390/electronics1311210613:11(2106)Online publication date: 29-May-2024
https://doi.org/10.3390/electronics13112106
Waqas ATripathi ARamachandran RStewart PRasool G(2024)Multimodal data integration for oncology in the era of deep neural networks: a reviewFrontiers in Artificial Intelligence10.3389/frai.2024.14088437Online publication date: 25-Jul-2024
https://doi.org/10.3389/frai.2024.1408843
Kil HKim JLee H(2024)Digital Image Forgery Detection Framework based on Image Feature Enhancement and Ensemble LearningThe Journal of Korean Institute of Information Technology10.14801/jkiit.2024.22.6.4122:6(41-53)Online publication date: 30-Jun-2024
https://doi.org/10.14801/jkiit.2024.22.6.41
Wu JNgo CChan WGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept BankProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658052(73-82)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658052
Ueki KSuzuki YTakushima HHori T(2024)Improving Video Retrieval Performance with Query Expansion Using ChatGPTProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647716(431-436)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3647649.3647716
Connor RVadicamo L(2024)nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert SpacesACM Transactions on Knowledge Discovery from Data10.1145/364764218:6(1-44)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3647642
Bakhtiarnia AZhang QIosifidis A(2024)Efficient High-Resolution Deep Learning: A SurveyACM Computing Surveys10.1145/364510756:7(1-35)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.1145/3645107
Gonzalez Penuela RCollins JBennett CAzenkot S(2024)Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision PeopleProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642211(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642211
Liu ZDou GChien EZhang CTian YZhu ZChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Breaking the Trilemma of Privacy, Utility, and Efficiency via Controllable Machine UnlearningProceedings of the ACM on Web Conference 202410.1145/3589334.3645669(1260-1271)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645669
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents