research-article

A visual approach for video geocoding using bag-of-scenes

Authors:

Otávio A. B. Penatti,

Jurandy Almeida,

Ricardo da S. TorresAuthors Info & Claims

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Article No.: 53, Pages 1 - 8

https://doi.org/10.1145/2324796.2324857

Published: 05 June 2012 Publication History

Abstract

This paper presents a novel approach for video representation, called bag-of-scenes. The proposed method is based on dictionaries of scenes, which provide a high-level representation for videos. Scenes are elements with much more semantic information than local features, specially for geotagging videos using visual content. Thus, each component of the representation model has self-contained semantics and, hence, it can be directly related to a specific place of interest. Experiments were conducted in the context of the MediaEval 2011 Placing Task. The reported results show our strategy compared to those from other participants that used only visual content to accomplish this task. Despite our very simple way to generate the visual dictionary, which has taken photos at random, the results show that our approach presents high accuracy relative to the state-of-the art solutions.

References

[1]

J. Almeida, N. J. Leite, and R. Torres. Comparison of video sequences with histograms of motion patterns. In ICIP, pages 3673--3676, 2011.

[2]

J. Almeida, N. J. Leite, and R. Torres. VISON: VIdeo Summarization for ONline applications. Pattern Recognition Letters, 33(4):397--409, 2012.

Digital Library

[3]

J. Almeida, N. J. Leite, and R. Torres. Online video summarization on compressed domain. J. Visual Communication and Image Representation, 2012.

[4]

J. Almeida, R. Torres, and N. J. Leite. Rapid video summarization on compressed video. In ISM, pages 113--120, 2010.

Digital Library

[5]

S. Avila, N. Thome, M. Cord, E. Valle, and A. de A. Araújo. Bossa: Extended bow formalism for image classification. In ICIP, pages 2966--2969, 2011.

[6]

Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. CVPR, pages 2559--2566, 2010.

[7]

J. Choi, H. Lei, and G. Friedland. The 2011 ICSI video location estimation system. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.

[8]

C. Hauff and G.-J. Houben. WISTUD at MediaEval 2011: Placing task. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.

[9]

J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In CVPR, 2008.

[10]

Y.-G. Jiang and C.-W. Ngo. Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval. Computer Vision and Image Understanding, 113(3):405--414, 2009.

Digital Library

[11]

Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, and S. Kollias. Viral: Visual image retrieval and localization. Multimedia Tools and Applications, 51:555--592, 2011.

Digital Library

[12]

Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM MM, pages 869--876, 2004.

Digital Library

[13]

P. Kelm, S. Schmiedeke, and T. Sikora. Multi-modal, Multi-resource Methods for Placing Flickr Videos on the Map. In ACM ICMR, 2011.

Digital Library

[14]

E. P. X. L-J. Li, H. Su and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS, 2010.

[15]

O. V. Laere, S. Schockaert, and B. Dhoedt. Ghent university at the 2011 placing task. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.

[16]

I. Laptev. On space-time interest points. Int. J. Comp. Vision, 64(2--3):107--123, 2005.

Digital Library

[17]

M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In ACM ICMR, pages 51:1--51:8, 2011.

Digital Library

[18]

R. R. Larson. Geographic information retrieval and digital libraries. In ECDL, volume 5714/2009, pages 461--464, 2009.

Digital Library

[19]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169--2178, 2006.

Digital Library

[20]

L. T. Li, J. Almeida, and R. Torres. RECOD working notes for placing task MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.

[21]

L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, pages 1--8, 2011.

[22]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vision, 60(2):91--110, 2004.

Digital Library

[23]

J. Luo, D. Joshi, J. Yu, and A. Gallagher. Geotagging in multimedia and computer vision-a survey. Multimedia Tools Appl., 51:187--211, 2011.

Digital Library

[24]

B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada. Color and texture descriptors. IEEE Trans. Circuits Syst. Video Techn., 11(6):703--715, 2001.

Digital Library

[25]

K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. Int. J. Comp. Vision, 60(1):63--86, 2004.

Digital Library

[26]

K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. TPAMI, 27(10):1615--1630, 2005.

Digital Library

[27]

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. J. Van Gool. A comparison of affine region detectors. Int. J. Comp. Vision, 65(1--2):43--72, 2005.

Digital Library

[28]

C.-W. Ngo, W. Zhao, and Y.-G. Jiang. Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation. In ACM MM, pages 845--854, 2006.

Digital Library

[29]

S. J. Pan and Q. Yang. A survey on transfer learning. IEEE TKDE, 22(10):1345--1359, 2010.

Digital Library

[30]

O. A. B. Penatti, E. Valle, and R. Torres. Encoding spatial arrangement of visual words. In CIARP, volume 7042, pages 240--247, 2011.

Digital Library

[31]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1--8, Jun. 2008.

[32]

M. J. Pickering, D. Heesch, S. M. Rüger, R. O'Callaghan, and D. R. Bull. Video retrieval using global features in keyframes. In TREC, 2002.

[33]

A. Rae, V. Murdock, P. Serdyukov, and P. Kelm. Working notes for the placing task at MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.

[34]

M. Rautiainen and D. S. Doermann. Temporal color correlograms for video retrieval. In ICPR, pages 267--270, 2002.

Digital Library

[35]

P. Serdyukov, V. Murdock, and R. van Zwol. Placing flickr photos on a map. In ACM SIGIR, pages 484--491, 2009.

Digital Library

[36]

J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470--1477 vol.2, 2003.

Digital Library

[37]

J. R. Smith, S. Srinivasan, A. Amir, S. Basu, G. Iyengar, C.-Y. Lin, M. R. Naphade, D. B. Ponceleon, and B. L. Tseng. Integrating features, models, and semantics for trec video retrieval. In TREC, 2001.

[38]

T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision, 3:177--280, 2008.

Digital Library

[39]

K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. TPAMI, 32(9):1582--1596, 2010.

Digital Library

[40]

J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. Visual word ambiguity. TPAMI, 32:1271--1283, 2010.

Digital Library

[41]

O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of flickr resources using language models and similarity search. In ACM ICMR, pages 48:1--48:8, 2011.

Digital Library

[42]

V. Viitaniemi and J. Laaksonen. Experiments on selection of codebooks for local image feature histograms. In Int. Conf. on Visual Inf. Systems: Web-Based Visual Inf. Search and Management, pages 126--137, 2008.

Digital Library

[43]

L. Wu, Y. Guo, X. Qiu, Z. Feng, J. Rong, W. Jin, D. Zhou, R. Wang, and M. Jin. Fudan university at TRECVID 2003. In TRECVid, 2003.

[44]

X. Wu, W. Zhao, and C.-W. Ngo. Near-duplicate keyframe retrieval with visual keywords and semantic context. In CIVR, pages 162--169, 2007.

Digital Library

Cited By

Warch DStellbauer PNeis P(2024)Advanced Techniques for Geospatial Referencing in Online Media RepositoriesFuture Internet10.3390/fi1603008716:3(87)Online publication date: 1-Mar-2024
Glistrup MRudinac SJónsson B(2022)Urban Image Geo-Localization Using Open Data on Public SpacesProceedings of the 19th International Conference on Content-based Multimedia Indexing10.1145/3549555.3549589(50-56)Online publication date: 14-Sep-2022
Fox ELeidig J(2022)Digital Library ApplicationsundefinedOnline publication date: 10-Mar-2022
Show More Cited By

Index Terms

A visual approach for video geocoding using bag-of-scenes

Recommendations

Video Scene Detection Using Compact Bag of Visual Word Models

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives ...
Improving the BoVW via discriminative visual n-grams and MKL strategies

The Bag-of-Visual-Words (BoVW) representation has been widely used to approach a number of different high-level computer vision tasks. The idea behind the BoVW representation is similar to the Bag-of-Words (BoW) used in Natural Language Processing (NLP) ...
Multimodal geo-tagging in social media websites using hierarchical spatial segmentation
LBSN '12: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks

These days the sharing of photographs and videos is very popular in social networks. Many of these social media websites such as Flickr, Facebook and Youtube allows the user to manually label their uploaded videos with geo-information using a interface ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

June 2012

489 pages

ISBN:9781450313292

DOI:10.1145/2324796

Conference Chairs:
Horace H. S. Ip
City University of Hong Kong
,
Yong Rui
Microsoft, China

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICMR '12

Sponsor:

SIGMM

ICMR '12: International Conference on Multimedia Retrieval

June 5 - 8, 2012

Hong Kong, China

Acceptance Rates

ICMR '12 Paper Acceptance Rate 50 of 145 submissions, 34%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
345
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Warch DStellbauer PNeis P(2024)Advanced Techniques for Geospatial Referencing in Online Media RepositoriesFuture Internet10.3390/fi1603008716:3(87)Online publication date: 1-Mar-2024
Glistrup MRudinac SJónsson B(2022)Urban Image Geo-Localization Using Open Data on Public SpacesProceedings of the 19th International Conference on Content-based Multimedia Indexing10.1145/3549555.3549589(50-56)Online publication date: 14-Sep-2022
Fox ELeidig J(2022)Digital Library ApplicationsundefinedOnline publication date: 10-Mar-2022
Santos SAlmeida J(2020)Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI51738.2020.00017(62-68)Online publication date: Nov-2020
Pedronette DValem LAlmeida Jda S. Torres R(2019)Multimedia Retrieval Through Unsupervised Hypergraph-Based Manifold RankingIEEE Transactions on Image Processing10.1109/TIP.2019.292052628:12(5824-5838)Online publication date: Dec-2019
dos Santos SSebe NAlmeida J(2019)CV-C3D: Action Recognition on Compressed Videos with Convolutional 3D Networks2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI.2019.00012(24-30)Online publication date: Oct-2019
Yin YShah RZimmermann RBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Learning and Fusing Multimodal Deep Features for Acoustic Scene CategorizationProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240631(1892-1900)Online publication date: 15-Oct-2018
Yin YThapliya RZimmermann R(2018)Encoded Semantic Tree for Automatic User Profiling Applied to Personalized Video SummarizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.260283228:1(181-192)Online publication date: 1-Jan-2018
Duarte LPenatti OAlmeida J(2018)Bag of Attributes for Video Event Retrieval2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI.2018.00064(447-454)Online publication date: Oct-2018
Almeida JValem LPedronette D(2017)A Rank Aggregation Framework for Video Interestingness PredictionImage Analysis and Processing - ICIAP 201710.1007/978-3-319-68560-1_1(3-14)Online publication date: 13-Oct-2017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents