Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2324796.2324857acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

A visual approach for video geocoding using bag-of-scenes

Published: 05 June 2012 Publication History

Abstract

This paper presents a novel approach for video representation, called bag-of-scenes. The proposed method is based on dictionaries of scenes, which provide a high-level representation for videos. Scenes are elements with much more semantic information than local features, specially for geotagging videos using visual content. Thus, each component of the representation model has self-contained semantics and, hence, it can be directly related to a specific place of interest. Experiments were conducted in the context of the MediaEval 2011 Placing Task. The reported results show our strategy compared to those from other participants that used only visual content to accomplish this task. Despite our very simple way to generate the visual dictionary, which has taken photos at random, the results show that our approach presents high accuracy relative to the state-of-the art solutions.

References

[1]
J. Almeida, N. J. Leite, and R. Torres. Comparison of video sequences with histograms of motion patterns. In ICIP, pages 3673--3676, 2011.
[2]
J. Almeida, N. J. Leite, and R. Torres. VISON: VIdeo Summarization for ONline applications. Pattern Recognition Letters, 33(4):397--409, 2012.
[3]
J. Almeida, N. J. Leite, and R. Torres. Online video summarization on compressed domain. J. Visual Communication and Image Representation, 2012.
[4]
J. Almeida, R. Torres, and N. J. Leite. Rapid video summarization on compressed video. In ISM, pages 113--120, 2010.
[5]
S. Avila, N. Thome, M. Cord, E. Valle, and A. de A. Araújo. Bossa: Extended bow formalism for image classification. In ICIP, pages 2966--2969, 2011.
[6]
Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. CVPR, pages 2559--2566, 2010.
[7]
J. Choi, H. Lei, and G. Friedland. The 2011 ICSI video location estimation system. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.
[8]
C. Hauff and G.-J. Houben. WISTUD at MediaEval 2011: Placing task. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.
[9]
J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In CVPR, 2008.
[10]
Y.-G. Jiang and C.-W. Ngo. Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval. Computer Vision and Image Understanding, 113(3):405--414, 2009.
[11]
Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, and S. Kollias. Viral: Visual image retrieval and localization. Multimedia Tools and Applications, 51:555--592, 2011.
[12]
Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM MM, pages 869--876, 2004.
[13]
P. Kelm, S. Schmiedeke, and T. Sikora. Multi-modal, Multi-resource Methods for Placing Flickr Videos on the Map. In ACM ICMR, 2011.
[14]
E. P. X. L-J. Li, H. Su and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS, 2010.
[15]
O. V. Laere, S. Schockaert, and B. Dhoedt. Ghent university at the 2011 placing task. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.
[16]
I. Laptev. On space-time interest points. Int. J. Comp. Vision, 64(2--3):107--123, 2005.
[17]
M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In ACM ICMR, pages 51:1--51:8, 2011.
[18]
R. R. Larson. Geographic information retrieval and digital libraries. In ECDL, volume 5714/2009, pages 461--464, 2009.
[19]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169--2178, 2006.
[20]
L. T. Li, J. Almeida, and R. Torres. RECOD working notes for placing task MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.
[21]
L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, pages 1--8, 2011.
[22]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vision, 60(2):91--110, 2004.
[23]
J. Luo, D. Joshi, J. Yu, and A. Gallagher. Geotagging in multimedia and computer vision-a survey. Multimedia Tools Appl., 51:187--211, 2011.
[24]
B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada. Color and texture descriptors. IEEE Trans. Circuits Syst. Video Techn., 11(6):703--715, 2001.
[25]
K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. Int. J. Comp. Vision, 60(1):63--86, 2004.
[26]
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. TPAMI, 27(10):1615--1630, 2005.
[27]
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. J. Van Gool. A comparison of affine region detectors. Int. J. Comp. Vision, 65(1--2):43--72, 2005.
[28]
C.-W. Ngo, W. Zhao, and Y.-G. Jiang. Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation. In ACM MM, pages 845--854, 2006.
[29]
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE TKDE, 22(10):1345--1359, 2010.
[30]
O. A. B. Penatti, E. Valle, and R. Torres. Encoding spatial arrangement of visual words. In CIARP, volume 7042, pages 240--247, 2011.
[31]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1--8, Jun. 2008.
[32]
M. J. Pickering, D. Heesch, S. M. Rüger, R. O'Callaghan, and D. R. Bull. Video retrieval using global features in keyframes. In TREC, 2002.
[33]
A. Rae, V. Murdock, P. Serdyukov, and P. Kelm. Working notes for the placing task at MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.
[34]
M. Rautiainen and D. S. Doermann. Temporal color correlograms for video retrieval. In ICPR, pages 267--270, 2002.
[35]
P. Serdyukov, V. Murdock, and R. van Zwol. Placing flickr photos on a map. In ACM SIGIR, pages 484--491, 2009.
[36]
J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470--1477 vol.2, 2003.
[37]
J. R. Smith, S. Srinivasan, A. Amir, S. Basu, G. Iyengar, C.-Y. Lin, M. R. Naphade, D. B. Ponceleon, and B. L. Tseng. Integrating features, models, and semantics for trec video retrieval. In TREC, 2001.
[38]
T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision, 3:177--280, 2008.
[39]
K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. TPAMI, 32(9):1582--1596, 2010.
[40]
J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. Visual word ambiguity. TPAMI, 32:1271--1283, 2010.
[41]
O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of flickr resources using language models and similarity search. In ACM ICMR, pages 48:1--48:8, 2011.
[42]
V. Viitaniemi and J. Laaksonen. Experiments on selection of codebooks for local image feature histograms. In Int. Conf. on Visual Inf. Systems: Web-Based Visual Inf. Search and Management, pages 126--137, 2008.
[43]
L. Wu, Y. Guo, X. Qiu, Z. Feng, J. Rong, W. Jin, D. Zhou, R. Wang, and M. Jin. Fudan university at TRECVID 2003. In TRECVid, 2003.
[44]
X. Wu, W. Zhao, and C.-W. Ngo. Near-duplicate keyframe retrieval with visual keywords and semantic context. In CIVR, pages 162--169, 2007.

Cited By

View all
  • (2024)Advanced Techniques for Geospatial Referencing in Online Media RepositoriesFuture Internet10.3390/fi1603008716:3(87)Online publication date: 1-Mar-2024
  • (2022)Urban Image Geo-Localization Using Open Data on Public SpacesProceedings of the 19th International Conference on Content-based Multimedia Indexing10.1145/3549555.3549589(50-56)Online publication date: 14-Sep-2022
  • (2022)Digital Library ApplicationsundefinedOnline publication date: 10-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
June 2012
489 pages
ISBN:9781450313292
DOI:10.1145/2324796
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. geotagging
  2. placing task
  3. video representation
  4. visual words

Qualifiers

  • Research-article

Funding Sources

Conference

ICMR '12
Sponsor:

Acceptance Rates

ICMR '12 Paper Acceptance Rate 50 of 145 submissions, 34%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Advanced Techniques for Geospatial Referencing in Online Media RepositoriesFuture Internet10.3390/fi1603008716:3(87)Online publication date: 1-Mar-2024
  • (2022)Urban Image Geo-Localization Using Open Data on Public SpacesProceedings of the 19th International Conference on Content-based Multimedia Indexing10.1145/3549555.3549589(50-56)Online publication date: 14-Sep-2022
  • (2022)Digital Library ApplicationsundefinedOnline publication date: 10-Mar-2022
  • (2020)Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI51738.2020.00017(62-68)Online publication date: Nov-2020
  • (2019)Multimedia Retrieval Through Unsupervised Hypergraph-Based Manifold RankingIEEE Transactions on Image Processing10.1109/TIP.2019.292052628:12(5824-5838)Online publication date: Dec-2019
  • (2019)CV-C3D: Action Recognition on Compressed Videos with Convolutional 3D Networks2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI.2019.00012(24-30)Online publication date: Oct-2019
  • (2018)Learning and Fusing Multimodal Deep Features for Acoustic Scene CategorizationProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240631(1892-1900)Online publication date: 15-Oct-2018
  • (2018)Encoded Semantic Tree for Automatic User Profiling Applied to Personalized Video SummarizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.260283228:1(181-192)Online publication date: 1-Jan-2018
  • (2018)Bag of Attributes for Video Event Retrieval2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)10.1109/SIBGRAPI.2018.00064(447-454)Online publication date: Oct-2018
  • (2017)A Rank Aggregation Framework for Video Interestingness PredictionImage Analysis and Processing - ICIAP 201710.1007/978-3-319-68560-1_1(3-14)Online publication date: 13-Oct-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media