Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Unsupervised Hashing with Semantic Concept Mining

Published: 30 May 2023 Publication History

Abstract

Recently, to improve the unsupervised image retrieval performance, plenty of unsupervised hashing methods have been proposed by designing a semantic similarity matrix, which is based on the similarities between image features extracted by a pre-trained CNN model. However, most of these methods tend to ignore high-level abstract semantic concepts contained in images. Intuitively, concepts play an important role in calculating the similarity among images. In real-world scenarios, each image is associated with some concepts, and the similarity between two images will be larger if they share more identical concepts. Inspired by the above intuition, in this work, we propose a novel Unsupervised Hashing with Semantic Concept Mining, called UHSCM, which leverages a VLP model to construct a high-quality similarity matrix. Specifically, a set of randomly chosen concepts is first collected. Then, by employing a vision-language pretraining (VLP) model with the prompt engineering which has shown strong power in visual representation learning, the set of concepts is denoised according to the training images. Next, the proposed method UHSCM applies the VLP model with prompting again to mine the concept distribution of each image and construct a high-quality semantic similarity matrix based on the mined concept distributions. Finally, with the semantic similarity matrix as guiding information, a novel hashing loss with a modified contrastive loss based regularization item is proposed to optimize the hashing network. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.

Supplemental Material

MP4 File
Presentation video for the paper "Unsupervised Hashing with Semantic Concept Mining" inSIGMOD 2023

References

[1]
Hassan Akbari, Linagzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. arXiv preprint arXiv:2104.11178 (2021).
[2]
Yue Cao, Mingsheng Long, Bin Liu, and Jianmin Wang. 2018. Deep cauchy hashing for hamming space retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1237.
[3]
Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S Yu. 2017. Hashnet: Deep learning to hash by continuation. In Proceedings of the IEEE international conference on computer vision. 5608--5617.
[4]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In Proceedings of the ACM international conference on image and video retrieval. ACM, 48.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[7]
Sepehr Eghbali and Ladan Tahvildari. 2019. Deep Spherical Quantization for Image Search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11690--11699.
[8]
Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in high dimensions via hashing. In Vldb, Vol. 99. 518--529.
[9]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 249--256.
[10]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2012. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 12 (2012), 2916--2929.
[11]
Junfeng He, Wei Liu, and Shih-Fu Chang. 2010. Scalable similarity search with optimized kernel hashing. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 1129--1138.
[12]
Long-Kai Huang, Jianda Chen, and Sinno Jialin Pan. 2019. Accelerate Learning of Deep Hashing With Gradient Attention. In Proceedings of the IEEE International Conference on Computer Vision. 5271--5280.
[13]
Shanshan Huang, Yichao Xiong, Ya Zhang, and Jia Wang. 2017. Unsupervised Triplet Hashing for Fast Image Retrieval. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017. ACM, 84--92.
[14]
Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39--43.
[15]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. arXiv preprint arXiv:2102.05918 (2021).
[16]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012), 1097--1105.
[18]
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4242--4251.
[19]
Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[20]
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019).
[21]
Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2016. Feature learning based deep supervised hashing with pairwise labels. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, 1711--1717.
[22]
Kevin Lin, Jiwen Lu, Chu-Song Chen, and Jie Zhou. 2016. Learning compact binary descriptors with unsupervised deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1183--1192.
[23]
Qinghong Lin, Xiaojun Chen, Qin Zhang, Shangxuan Tian, and Yudong Chen. 2021a. Deep Self-Adaptive Hashing for Image Retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1028--1037.
[24]
Qinghong Lin, Xiaojun Chen, Qin Zhang, Shangxuan Tian, and Yudong Chen. 2021b. Deep Self-Adaptive Hashing for Image Retrieval. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1028--1037.
[25]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.
[26]
Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete graph hashing. Advances in neural information processing systems, Vol. 27 (2014).
[27]
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012a. Supervised hashing with kernels. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2074--2081.
[28]
Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with graphs. In Icml.
[29]
Wei Liu, Jun Wang, Yadong Mu, Sanjiv Kumar, and Shih-Fu Chang. 2012b. Compact hyperplane hashing with bilinear functions. arXiv preprint arXiv:1206.4618 (2012).
[30]
Wei Liu and Tongtao Zhang. 2016. Multimedia hashing and networking. IEEE MultiMedia, Vol. 23, 3 (2016), 75--79.
[31]
Xiao Luo, Haixin Wang, Daqing Wu, Chong Chen, Minghua Deng, Jianqiang Huang, and Xian-Sheng Hua. 2020a. A survey on deep hashing methods. ACM Transactions on Knowledge Discovery from Data (TKDD) (2020).
[32]
Xiao Luo, Daqing Wu, Zeyu Ma, Chong Chen, Minghua Deng, Jinwen Ma, Zhongming Jin, Jianqiang Huang, and Xian-Sheng Hua. 2020b. Cimon: Towards high-quality hash codes. arXiv preprint arXiv:2010.07804 (2020).
[33]
Xin Luo, P. Zhang, Zi Huang, L. Nie, and Xin-Shun Xu. 2019. Discrete Hashing With Multiple Supervision. IEEE Transactions on Image Processing, Vol. 28 (2019), 2962--2975.
[34]
J MacQueen. 1967. Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability. 281--297.
[35]
Zexuan Qiu, Qinliang Su, Zijing Ou, Jianxing Yu, and Changyou Chen. 2020. Unsupervised Hashing with Contrastive Information Bottleneck. In IJCAI.
[36]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021).
[37]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.
[38]
Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 37--45.
[39]
Yuming Shen, Jie Qin, Jiaxin Chen, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, and Ling Shao. 2020. Auto-Encoding Twin-Bottleneck Hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2818--2827.
[40]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[41]
Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2018. Binary generative adversarial networks for image retrieval. In Thirty-second AAAI conference on artificial intelligence.
[42]
Shupeng Su, Chao Zhang, Kai Han, and Yonghong Tian. 2018. Greedy hash: Towards fast optimization for accurate hash coding in cnn. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 806--815.
[43]
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2019. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019).
[44]
Hao Tan and Mohit Bansal. 2019. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019).
[45]
Rong-Cheng Tu, Lei Ji, Huaishao Luo, Botian Shi, He-Yan Huang, Nan Duan, and Xian-Ling Mao. 2021a. Hashing based Efficient Inference for Image-Text Matching. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 743--752.
[46]
Rong-Cheng Tu, Xianling Mao, and Wei Wei. 2020a. MLS3RDUH: Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing. In IJCAI. 3466--3472.
[47]
Rong-Cheng Tu, Xian-Ling Mao, Bo-Si Feng, and Yu Shu-Ying. 2018. Object detection based deep unsupervised hashing. In IJCAI. 3606--3612.
[48]
Rong-Cheng Tu, Xian-Ling Mao, Jia-Nan Guo, Wei Wei, and Heyan Huang. 2021b. Partial-Softmax Loss based Deep Hashing. In Proceedings of the Web Conference 2021. 2869--2878.
[49]
Rong-Cheng Tu, Xian-Ling Mao, Cihang Kong, Zihang Shao, Ze-Lin Li, Wei Wei, and Heyan Huang. 2021c. Weighted Gaussian Loss based Hamming Hashing. In Proceedings of the 29th ACM International Conference on Multimedia. 3409--3417.
[50]
Rong-Cheng Tu, Xian-Ling Mao, Bing Ma, Yong Hu, Tan Yan, Wei Wei, and Heyan Huang. 2020b. Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. IEEE Transactions on Knowledge and Data Engineering (2020).
[51]
Rong-Cheng Tu, Xian-Ling Mao, Rong-Xin Tu, Binbin Bian, Chengfei Cai, Wei Wei, Heyan Huang, et al. 2022. Deep cross-modal proxy hashing. IEEE Transactions on Knowledge and Data Engineering (2022).
[52]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[53]
Dan Wang, Heyan Huang, Chi Lu, Bo-Si Feng, Liqiang Nie, Guihua Wen, and Xian-Ling Mao. 2017. Supervised deep hashing for hierarchical labeled data. arXiv preprint arXiv:1704.02088 (2017).
[54]
Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2015. Learning to hash for indexing big data-A survey. Proc. IEEE, Vol. 104, 1 (2015), 34--57.
[55]
Jun Wang, Wei Liu, Andy X Sun, and Yu-Gang Jiang. 2013. Learning hash codes with listwise supervision. In Proceedings of the IEEE international conference on computer vision. 3032--3039.
[56]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in neural information processing systems. 1753--1760.
[57]
Erkun Yang, Cheng Deng, Chao Li, Wei Liu, Jie Li, and Dacheng Tao. 2018a. Shared predictive cross-modal deep quantization. IEEE transactions on neural networks and learning systems, Vol. 29, 11 (2018), 5292--5303.
[58]
Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, and Dacheng Tao. 2018b. Semantic structure-based unsupervised deep hashing. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 1064--1070.
[59]
Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao. 2017. Pairwise relationship guided deep hashing for cross-modal retrieval. In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.
[60]
Erkun Yang, Tongliang Liu, Cheng Deng, Wei Liu, and Dacheng Tao. 2019. DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2946--2955.
[61]
Felix Yu, Sanjiv Kumar, Yunchao Gong, and Shih-Fu Chang. 2014. Circulant binary embedding. In International conference on machine learning. 946--954.
[62]
Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, and Jiashi Feng. 2020. Central similarity quantization for efficient image and video retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3083--3092.
[63]
Kun Zhan, J. Guan, Yi Yang, and Qun Wu. 2016. Unsupervised discriminative hashing. J. Vis. Commun. Image Represent., Vol. 40 (2016), 847--851.
[64]
Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and Tat-Seng Chua. 2016. Discrete collaborative filtering. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 325--334.
[65]
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2021. Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021).

Cited By

View all
  • (2024)Level of Agreement between Emotions Generated by Artificial Intelligence and Human Evaluation: A Methodological ProposalElectronics10.3390/electronics1320401413:20(4014)Online publication date: 12-Oct-2024
  • (2024)ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language ModelsProceedings of the VLDB Endowment10.14778/3665844.366585717:9(2279-2292)Online publication date: 6-Aug-2024
  • (2024)How Do Categorical Duplicates Affect ML? A New Benchmark and Empirical AnalysesProceedings of the VLDB Endowment10.14778/3648160.364817817:6(1391-1404)Online publication date: 3-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. image retrieval
  2. semantic concept mining
  3. unsupervised hashing

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)159
  • Downloads (Last 6 weeks)19
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Level of Agreement between Emotions Generated by Artificial Intelligence and Human Evaluation: A Methodological ProposalElectronics10.3390/electronics1320401413:20(4014)Online publication date: 12-Oct-2024
  • (2024)ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language ModelsProceedings of the VLDB Endowment10.14778/3665844.366585717:9(2279-2292)Online publication date: 6-Aug-2024
  • (2024)How Do Categorical Duplicates Affect ML? A New Benchmark and Empirical AnalysesProceedings of the VLDB Endowment10.14778/3648160.364817817:6(1391-1404)Online publication date: 3-May-2024
  • (2024)Table-GPT: Table Fine-tuned GPT for Diverse Table TasksProceedings of the ACM on Management of Data10.1145/36549792:3(1-28)Online publication date: 30-May-2024
  • (2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
  • (2024)Improve Deep Hashing with Language Guidance for Unsupervised Image RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658059(137-145)Online publication date: 30-May-2024
  • (2024)Discrepancy and Structure-Based Contrast for Test-Time Adaptive RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338133726(8665-8677)Online publication date: 25-Mar-2024
  • (2024)Similarity Transitivity Broken-Aware Multi-Modal HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339649236:11(7003-7014)Online publication date: Nov-2024
  • (2024)Exploring Hierarchical Information in Hyperbolic Space for Self-Supervised Image HashingIEEE Transactions on Image Processing10.1109/TIP.2024.337135833(1768-1781)Online publication date: 5-Mar-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media