Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.2312/pg.20181287guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype

A deep learned method for video indexing and retrieval

Published: 08 October 2018 Publication History


In this paper, we proposed a deep neural network based method for content based video retrieval. Our approach leveraged the deep neural network to generate the semantic information and introduced the graph-based storage structure to establish the video indices. We devised the Inception-Single Shot Multibox Detector (ISSD) and RI3D model to extract spatial semantic information (objects) and extract temporal semantic information (actions). Our ISSD model achieved a mAP of 26.7% on MS COCO dataset, increasing 3.2% over the original SSD model, while the RI3D model achieved a top-1 accuracy of 97.7% on dataset UCF-101. And we also introduced the graph structure to build the video index with the temporal and spatial semantic information. Our experiment results showed that the deep learned semantic information is highly effective for video indexing and retrieval.


{ALN*17} Awad G., Le D.-D., Ngo C.-W., Nguyen V.-T., Quénot G., Snoek C., Satoh S.: Video indexing, search, detection, and description with focus on trecvid. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (2017), ACM, pp. 3--4. 1
{BFG*16} Bilen H., Fernando B., Gavves E., Vedaldi A., Gould S.: Dynamic image networks for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3034--3042. 3
{BLZBG16} Bell S., Lawrence Zitnick C., Bala K., Girshick R.: Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2874--2883. 3
{CZ17} Carreira J., Zisserman A.: Quo vadis, action recognition? a new model and the kinetics dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE, pp. 4724--4733. 2
{FPZ16} Feichtenhofer C., Pinz A., Zisserman A.: Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 1933--1941. 3
{GB10} Glorot X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (2010), pp. 249--256. 3
{GM14} Gkalelis N., Mezaris V.: Video event detection using generalized subclass discriminant analysis and linear support vector machines. In Proceedings of international conference on multimedia retrieval (2014), ACM, p. 25. 1
{HMS14} Habibian A., Mensink T., Snoek C. G.: Composite concept discovery for zero-shot video event detection. In Proceedings of International Conference on Multimedia Retrieval (2014), ACM, p. 17. 1
{IS15} Ioffe S., Szegedy C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015). 2
{JMYH14} Jiang L., Mitamura T., Yu S.-I., Hauptmann A. G.: Zero-example event search using multimodal pseudo relevance feedback. In Proceedings of International Conference on Multimedia Retrieval (2014), ACM, p. 297. 1
{JYM*15} Jiang L., Yu S.-I., Meng D., Yang Y., Mitamura T., Hauptmann A. G.: Fast and accurate content-based semantic search in 100m internet videos. In Proceedings of the 23rd ACM international conference on Multimedia (2015), ACM, pp. 49--58. 1
{LAE*16} Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., Berg A. C.: Ssd: Single shot multibox detector. In European conference on computer vision (2016), Springer, pp. 21--37. 1, 2, 3
{LMB*14} Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C. L.: Microsoft coco: Common objects in context. In European conference on computer vision (2014), Springer, pp. 740--755. 3
{LNZN17} Lu Y.-J., Nguyen P. A., Zhang H., Ngo C.-W.: Concept-based interactive search system. In International Conference on Multimedia Modeling (2017), Springer, pp. 463--468. 1
{LZdBN16} Lu Y.-J., Zhang H., de Boer M., Ngo C.-W.: Event detection with zero example: select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (2016), ACM, pp. 127--134. 1
{Mil95} Miller G. A.: Wordnet: a lexical database for english. Communications of the ACM 38, 11 (1995), 39--41. 1, 2
{PP16} Podlesnaya A., Podlesnyy S.: Deep learning based semantic video indexing and retrieval. In Proceedings of SAI Intelligent Systems Conference (2016), Springer, pp. 359--372. 1
{QWY*16} Qiu J., Wang J., Yao S., Guo K., Li B., Zhou E., Yu J., Tang T., Xu N., Song S., et al.: Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016), ACM, pp. 26--35. 2
{SZ14} Simonyan K., Zisserman A.: Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (2014), pp. 568--576. 2, 3
{ZPB07} Zach C., Pock T., Bischof H.: A duality based approach for realtime tv-l 1 optical flow. In Joint Pattern Recognition Symposium (2007), Springer, pp. 214--223. 2



Information & Contributors


Published In

cover image Guide Proceedings
PG '18: Proceedings of the 26th Pacific Conference on Computer Graphics and Applications: Short Papers
October 2018
101 pages


Eurographics Association

Goslar, Germany

Publication History

Published: 08 October 2018


  • Research-article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics


View Options

View options






Share this Publication link

Share on social media