Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Graph Wasserstein Correlation Analysis for Movie Retrieval

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12370))

Included in the following conference series:

Abstract

Movie graphs play an important role to bridge heterogenous modalities of videos and texts in human-centric retrieval. In this work, we propose Graph Wasserstein Correlation Analysis (GWCA) to deal with the core issue therein, i.e, cross heterogeneous graph comparison. Spectral graph filtering is introduced to encode graph signals, which are then embedded as probability distributions in a Wasserstein space, called graph Wasserstein metric learning. Such a seamless integration of graph signal filtering together with metric learning results in a surprise consistency on both learning processes, in which the goal of metric learning is just to optimize signal filters or vice versa. Further, we derive the solution of the graph comparison model as a classic generalized eigenvalue decomposition problem, which has an exactly closed-form solution. Finally, GWCA together with movie/text graphs generation are unified into the framework of movie retrieval to evaluate our proposed method. Extensive experiments on MovieGrpahs dataset demonstrate the effectiveness of our GWCA as well as the entire framework.

X. Zhang and T. Zhang – Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We can pad zero values to one of them to produce the same dimensions for them.

References

  1. https://github.com/ageitgey/face_recognition/blob/master/README_Simplified_Chinese.md/

  2. Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  Google Scholar 

  3. Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2280–2287 (2013)

    Google Scholar 

  4. Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. Comput. Sci. (2014)

    Google Scholar 

  5. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)

    Article  MathSciNet  Google Scholar 

  6. Chen, X., Zitnick, C.L.: Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654 (2014)

  7. Chung, F.R.: Lectures on spectral graph theory. CBMS Lect. Fresno 6, 17–21 (1996)

    Google Scholar 

  8. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  9. Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/Script: alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 158–171. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_12

    Chapter  Google Scholar 

  10. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems, pp. 3844–3852 (2016)

    Google Scholar 

  11. Ding, L., Yilmaz, A.: Learning relations among movie characters: a social network perspective. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 410–423. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_30

    Chapter  Google Scholar 

  12. Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)

    Article  MathSciNet  Google Scholar 

  13. Farhadi, A., et al.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2

    Chapter  Google Scholar 

  14. Hoory, S., Linial, N., Wigderson, A.: Expander graphs and their applications. Bull. Am. Math. Soc. 43(4), 439–561 (2006)

    Article  MathSciNet  Google Scholar 

  15. Jiang, J., Cui, Z., Xu, C., Yang, J.: Gaussian-induced convolution for graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4007–4014 (2019)

    Google Scholar 

  16. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  17. Lafon, S., Lee, A.B.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1393–1403 (2006)

    Article  Google Scholar 

  18. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

    Google Scholar 

  19. Li, B., Li, X., Zhang, Z., Wu, F.: Spatio-temporal graph routing for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8561–8568 (2019)

    Google Scholar 

  20. Narang, S.K., Ortega, A.: Lifting based wavelet transforms on graphs. In: Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, Asia-Pacific Signal and Information Processing Association, 2009 Annual ..., pp. 441–444 (2009)

    Google Scholar 

  21. Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: Advances in Neural Information Processing Systems, pp. 1143–1151 (2011)

    Google Scholar 

  22. Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4594–4602 (2016)

    Google Scholar 

  23. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  24. Ron, D., Safro, I., Brandt, A.: Relaxation-based coarsening and multiscale graph organization. Multiscale Model. Simul. 9(1), 407–423 (2011)

    Article  MathSciNet  Google Scholar 

  25. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Sci. 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  26. Sankar, P., Jawahar, C., Zisserman, A.: Subtitle-free movie to script alignment. In: Proceedings of the British Machine Vision Conference, pp. 121:1–121:11 (2009)

    Google Scholar 

  27. Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)

    Article  Google Scholar 

  28. Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: MovieQA: understanding stories in movies through question-answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4631–4640 (2016)

    Google Scholar 

  29. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Sci. 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  30. Vicol, P., Tapaswi, M., Castrejon, L., Fidler, S.: MovieGraphs: towards understanding human-centric situations from videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8581–8590 (2018)

    Google Scholar 

  31. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  32. Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp. 802–810 (2015)

    Google Scholar 

  33. Yang, Y., Teo, C.L., Daumé III, H., Aloimonos, Y.: Corpus-guided sentence generation of natural images. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 444–454. Association for Computational Linguistics (2011)

    Google Scholar 

  34. Zhang, T., et al.: Cross-graph convolution learning for large-scale text-picture shopping guide in e-commerce search. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1657–1666. IEEE (2020)

    Google Scholar 

  35. Zhao, W., Cui, Z., Xu, C., Li, C., Zhang, T., Yang, J.: Hashing graph convolution for node classification. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 519–528 (2019)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grants Nos. 61906094, 61972204), the Natural Science Foundation of Jiangsu Province (Grant Nos. BK20190019, BK20190452), and the fundamental research funds for the central universities (No. 30919011232).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Cui .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 128 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Zhang, T., Hong, X., Cui, Z., Yang, J. (2020). Graph Wasserstein Correlation Analysis for Movie Retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12370. Springer, Cham. https://doi.org/10.1007/978-3-030-58595-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58595-2_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58594-5

  • Online ISBN: 978-3-030-58595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics