Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1386352.1386361acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
research-article

Inferring generic activities and events from image content and bags of geo-tags

Published: 07 July 2008 Publication History

Abstract

The use of contextual information in building concept detectors for digital media has caught the attention of the multimedia community in the recent years. Generally speaking, any information extracted from image headers or tags, or from large collections of related images and used at classification time, can be considered as contextual. Such information, being discriminative in its own right, when combined with pure content-based detection systems using pixel information, can improve the overall recognition performance significantly. In this paper, we describe a framework for probabilistically modeling geographical information using a Geographical Information Systems (GIS) database for event and activity recognition in general-purpose consumer images, such as those obtained from Flickr. The proposed framework discriminatively models the statistical saliency of geo-tags in describing an activity or event. Our work leverages the inherent patterns of association between events and their geographical venues. We use descriptions of small local neighborhoods to form bags of geo tags as our representation. Statistical coherence is observed in such descriptions across a wide range of event classes and across many different users. In order to test our approach, we identify certain classes of activities and events wherein people commonly participate and take pictures. Images and corresponding metadata, for the identified events and activities, are obtained from Flickr. We employ visual detectors obtained from Columbia University (Columbia 374), which perform pure visual event and activity recognition. In our experiments, we present the performance advantage obtained by combining contextual GPS information with pixel-based detection systems.

References

[1]
Ames, M. and Naaman, M. 2007. Why we tag: Motivations for annotation in mobile and online media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2007.
[2]
Amitay, E., Har'El, N., Sivan, R., and Soffer A. 2004. Web-a-where: Geotagging web content. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2004.
[3]
Barnard, K., Duyugulu, P., Forsyth, D., de-Freitas, N., Blei, D. M., and Jordan, M. I. 2003. Matching words and pictures. J. Machine Learn. Res. 3 (Mar. 2003) 1107--1135.
[4]
Chang, S.-F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A. C., and Luo, J. 2007. Large-scale multimodal semantic concept detection for consumer video. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2007.
[5]
Chen, Y., Chen, X. Y., Rao, F. Y., Yu, X. L., Li, Y., and Liu, D. 2003. LORE: An infrastructure to support location-aware services. IBM J. Res. Devel. 48(5/6) (2004) 601--616.
[6]
Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2007. Tagging over time: Real-world image annotation by light weight meta-leaning. In Proceedings of the ACM International Conference on Multimedia, 2007.
[7]
Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(65) (2008).
[8]
Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghavan, P., and Tomkins, A. 2006. Visualizing tags over time. In Proceedings of the World Wide Web, 2006.
[9]
Hinze A. and Voisard, A. 2003. Location and time-based information delivery in tourism. Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science, 2750 (2003) 489--507.
[10]
Jaffe, A., Tassa, T., and Davis, M. 2006. Generating summaries and visualization for large collections of georeferenced photographs. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2006.
[11]
Joshi, D., Naphade, M., and Natsev, A. 2007. Semantics reinforcement and fusion learning for multimedia streams. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2007:
[12]
Kennedy, L., Naaman, M., Ahern, S., Nair, R., and Rattenbury, T. 2007. How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of the ACM International Conference on Multimedia, 2007.
[13]
Kherfi, M. L., Ziou, D., and Bernardi, A. 2004. Image retrieval from the World Wide Web: Issues, techniques, and systems. ACM Comput. Surv. 36(1) (2004) 35--67.
[14]
Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimedia Comput., Commun. Applic. 2(1) (2006) 1--19.
[15]
Li, J. and Wang, J. Z. Real-time computerized annotation of pictures. IEEE Trans. Pattern Anal. Machine Intell., 30(6) (2008), 985--1002.
[16]
Liu, L., Wolfson, O., and Yin, H. 2006. Extracting semantic location from outdoor positioning systems. In Proceedings of the IEEE International Conference on Mobile Data Management, 2006.
[17]
Luo, J., Boutell, M., and Brown, C. 2006. Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process. Mag. 23(2) (March 2006) 101--114.
[18]
Loui, A. C., Luo, J., Chang, S.-F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., and Yanagawa, A. 2007. Kodak's consumer video benchmark dataset: Concept definition and annotation. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2007.
[19]
Monay, F. and Gatica-Perez, G. 2003. On image annotation with latent space models. In Proceedings of the ACM International Conference on multimedia, 2003.
[20]
Naphade, M. and Smith, J. R. 2004. On detection of semantic concepts at TRECVID. In Proceedings of the ACM International Conference on Multimedia, 2004.
[21]
Schiller, J. H. and Voisard, A. 2004. Location-based services. Morgan Kaufmann, 2004.
[22]
Snoek, C. G. M., Worring, M., and Smeulders, A. W. M. 2005. Early versus late fusion in semantic video analysis. In Proceedings of the ACM International Conference on Multimedia, 2005.
[23]
Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Columbia University ADVENT Technical Report, 2007.
[24]
Yang, J., Yan, R., and Hauptmann, A. G. 2007. Crossdomain video concept detection using adaptive SVMs. In Proceedings of the ACM International Conference on Multimedia, 2007.

Cited By

View all
  • (2022)Exploiting Geodata to Improve Image Recognition with Deep LearningCompanion Proceedings of the Web Conference 202210.1145/3487553.3524645(648-655)Online publication date: 25-Apr-2022
  • (2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
  • (2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
  • Show More Cited By

Index Terms

  1. Inferring generic activities and events from image content and bags of geo-tags

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval
    July 2008
    674 pages
    ISBN:9781605580708
    DOI:10.1145/1386352
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. concept detection
    2. geo-tag
    3. image retrieval
    4. late fusion
    5. reverse geocoding

    Qualifiers

    • Research-article

    Conference

    CIVR08

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Exploiting Geodata to Improve Image Recognition with Deep LearningCompanion Proceedings of the Web Conference 202210.1145/3487553.3524645(648-655)Online publication date: 25-Apr-2022
    • (2022)GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS CoordinatesIEEE Transactions on Multimedia10.1109/TMM.2021.306095124(890-903)Online publication date: 2022
    • (2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
    • (2019)Survey on Social Networks Data AnalysisInnovations for Community Services10.1007/978-3-030-37484-6_6(100-119)Online publication date: 15-Dec-2019
    • (2018)Georeferenced Social Multimedia as Volunteered Geographic InformationCyberGIS for Geospatial Discovery and Innovation10.1007/978-94-024-1531-5_12(225-246)Online publication date: 27-Jun-2018
    • (2017)ClickSmartIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.255565827:1(149-158)Online publication date: 1-Jan-2017
    • (2016)Event photo mining from Twitter using keyword bursts and image clusteringNeurocomputing10.1016/j.neucom.2015.02.081172(143-158)Online publication date: Jan-2016
    • (2016)A survey on Flickr multimedia research challengesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.01.00651:C(71-91)Online publication date: 1-May-2016
    • (2016)Where the Photos Were Taken: Location Prediction by Learning from Flickr PhotosLarge-Scale Visual Geo-Localization10.1007/978-3-319-25781-5_3(41-58)Online publication date: 6-Jul-2016
    • (2015)[Invited Paper] A Review of Web Image MiningITE Transactions on Media Technology and Applications10.3169/mta.3.1563:3(156-169)Online publication date: 2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media