research-article

A deep learned method for video indexing and retrieval

Authors:

X. LiAuthors Info & Claims

PG '18: Proceedings of the 26th Pacific Conference on Computer Graphics and Applications: Short Papers

Pages 85 - 88

https://doi.org/10.2312/pg.20181287

Published: 08 October 2018 Publication History

Abstract

In this paper, we proposed a deep neural network based method for content based video retrieval. Our approach leveraged the deep neural network to generate the semantic information and introduced the graph-based storage structure to establish the video indices. We devised the Inception-Single Shot Multibox Detector (ISSD) and RI3D model to extract spatial semantic information (objects) and extract temporal semantic information (actions). Our ISSD model achieved a mAP of 26.7% on MS COCO dataset, increasing 3.2% over the original SSD model, while the RI3D model achieved a top-1 accuracy of 97.7% on dataset UCF-101. And we also introduced the graph structure to build the video index with the temporal and spatial semantic information. Our experiment results showed that the deep learned semantic information is highly effective for video indexing and retrieval.

References

[1]

{ALN*17} Awad G., Le D.-D., Ngo C.-W., Nguyen V.-T., Quénot G., Snoek C., Satoh S.: Video indexing, search, detection, and description with focus on trecvid. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (2017), ACM, pp. 3--4. 1

Digital Library

[2]

{BFG*16} Bilen H., Fernando B., Gavves E., Vedaldi A., Gould S.: Dynamic image networks for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3034--3042. 3

[3]

{BLZBG16} Bell S., Lawrence Zitnick C., Bala K., Girshick R.: Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2874--2883. 3

[4]

{CZ17} Carreira J., Zisserman A.: Quo vadis, action recognition? a new model and the kinetics dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE, pp. 4724--4733. 2

[5]

{FPZ16} Feichtenhofer C., Pinz A., Zisserman A.: Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 1933--1941. 3

[6]

{GB10} Glorot X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (2010), pp. 249--256. 3

[7]

{GM14} Gkalelis N., Mezaris V.: Video event detection using generalized subclass discriminant analysis and linear support vector machines. In Proceedings of international conference on multimedia retrieval (2014), ACM, p. 25. 1

Digital Library

[8]

{HMS14} Habibian A., Mensink T., Snoek C. G.: Composite concept discovery for zero-shot video event detection. In Proceedings of International Conference on Multimedia Retrieval (2014), ACM, p. 17. 1

Digital Library

[9]

{IS15} Ioffe S., Szegedy C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015). 2

[10]

{JMYH14} Jiang L., Mitamura T., Yu S.-I., Hauptmann A. G.: Zero-example event search using multimodal pseudo relevance feedback. In Proceedings of International Conference on Multimedia Retrieval (2014), ACM, p. 297. 1

Digital Library

[11]

{JYM*15} Jiang L., Yu S.-I., Meng D., Yang Y., Mitamura T., Hauptmann A. G.: Fast and accurate content-based semantic search in 100m internet videos. In Proceedings of the 23rd ACM international conference on Multimedia (2015), ACM, pp. 49--58. 1

Digital Library

[12]

{LAE*16} Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., Berg A. C.: Ssd: Single shot multibox detector. In European conference on computer vision (2016), Springer, pp. 21--37. 1, 2, 3

[13]

{LMB*14} Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C. L.: Microsoft coco: Common objects in context. In European conference on computer vision (2014), Springer, pp. 740--755. 3

[14]

{LNZN17} Lu Y.-J., Nguyen P. A., Zhang H., Ngo C.-W.: Concept-based interactive search system. In International Conference on Multimedia Modeling (2017), Springer, pp. 463--468. 1

[15]

{LZdBN16} Lu Y.-J., Zhang H., de Boer M., Ngo C.-W.: Event detection with zero example: select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (2016), ACM, pp. 127--134. 1

Digital Library

[16]

{Mil95} Miller G. A.: Wordnet: a lexical database for english. Communications of the ACM 38, 11 (1995), 39--41. 1, 2

Digital Library

[17]

{PP16} Podlesnaya A., Podlesnyy S.: Deep learning based semantic video indexing and retrieval. In Proceedings of SAI Intelligent Systems Conference (2016), Springer, pp. 359--372. 1

[18]

{QWY*16} Qiu J., Wang J., Yao S., Guo K., Li B., Zhou E., Yu J., Tang T., Xu N., Song S., et al.: Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016), ACM, pp. 26--35. 2

Digital Library

[19]

{SZ14} Simonyan K., Zisserman A.: Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (2014), pp. 568--576. 2, 3

Digital Library

[20]

{ZPB07} Zach C., Pock T., Bischof H.: A duality based approach for realtime tv-l 1 optical flow. In Joint Pattern Recognition Symposium (2007), Springer, pp. 214--223. 2

Digital Library

Index Terms

A deep learned method for video indexing and retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval

Effective video retrieval is the result of interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword ...
An efficient indexing method for content-based image retrieval

In this paper, we propose an efficient indexing method for content-based image retrieval. The proposed method introduces the ordered quantization to increase the distinction among the quantized feature descriptors. Thus, the feature point ...
An efficient compressed domain video indexing method

Video indexing is employed to represent the features of video sequences. Motion vectors derived from compressed video are preferred for video indexing because they can be accessed by partial decoding; thus, they are used extensively in various video ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

PG '18: Proceedings of the 26th Pacific Conference on Computer Graphics and Applications: Short Papers

October 2018

101 pages

ISBN:9783038680734

General Chairs:
Hujun Bao
Zhejiang University
,
Horace H. S. Ip
City University of Hong Kong
,
Hans-Peter Seidel
Max-Planck-Institut für Informatik, Germany
,
Alla Sheffer
University of British Columbia
,
Program Chairs:
Hongbo Fu
City University of Hong Kong
,
Abhijeet Ghosh
Imperial College London
,
Johannes Kopf
Facebook Research

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 08 October 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten