research-article

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation

Authors:

Bin Zhu,

Xiangnan HeAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 3754 - 3763

https://doi.org/10.1145/3503161.3547836

Published: 10 October 2022 Publication History

Get Access

Abstract

Unsupervised video hashing typically aims to learn a compact binary vector to represent complex video content without using manual annotations. Existing unsupervised hashing methods generally suffer from incomplete exploration of various perspective dependencies (e.g., long-range and short-range) and data structures that exist in visual contents, resulting in less discriminative hash codes. In this paper, we propose aMulti-granularity Contextualized and Multi-Structure preserved Hashing (MCMSH) method, exploring multiple axial contexts for discriminative video representation generation and various structural information for unsupervised learning simultaneously. Specifically, we delicately design three self-gating modules to separately model three granularities of dependencies (i.e., long/middle/short-range dependencies) and densely integrate them into MLP-Mixer for feature contextualization, leading to a novel model MC-MLP. To facilitate unsupervised learning, we investigate three kinds of data structures, including clusters, local neighborhood similarity structure, and inter/intra-class variations, and design a multi-objective task to train MC-MLP. These data structures show high complementarities in hash code learning. We conduct extensive experiments using three video retrieval benchmark datasets, demonstrating that our MCMSH not only boosts the performance of the backbone MLP-Mixer significantly but also outperforms the competing methods notably. Code is available at: https://github.com/haoyanbin918/MCMSH.

Supplementary Material

MP4 File (mmfp0387.mp4)

This is the presentation video of the paper "Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation", which briefly introduces the paper from three aspects: background, methods and experiments. Our method simultaneously explores multiple axial contexts for discriminative video representation generation and various structural information for unsupervised learning. You can learn more about the paper through the video.

Download
38.54 MB

References

[1]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Linear unsupervised hashing for ANN search in Euclidean space

Fast Multi-label Learning via Hashing

Unsupervised Deep Hashing via Adaptive Clustering

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations