Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380150acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder

Published: 20 April 2020 Publication History

Abstract

Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.

References

[1]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875(2017).
[2]
Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. 2013. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems. 899–907.
[3]
David Berthelot, Colin Raffel, Aurko Roy, and Ian Goodfellow. 2018. Understanding and improving interpolation in autoencoders via an adversarial regularizer. arXiv preprint arXiv:1807.07543(2018).
[4]
Rainer E Burkard, Mauro Dell’Amico, and Silvano Martello. 2009. Assignment problems. Springer.
[5]
Suthee Chaidaroon and Yi Fang. 2017. Variational deep semantic hashing for text documents. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 75–84.
[6]
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724–1734.
[7]
Khoa D Doan, Pranjul Yadav, and Chandan K Reddy. 2019. Adversarial Factorization Autoencoder for Look-alike Modeling. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2803–2812.
[8]
Wei Dong, Zhe Wang, William Josephson, Moses Charikar, and Kai Li. 2008. Modeling LSH for performance tuning. In Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 669–678.
[9]
Tiezheng Ge, Kaiming He, and Jian Sun. 2014. Graph cuts for supervised binary coding. In European Conference on Computer Vision. Springer, 250–264.
[10]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12(2013), 2916–2929.
[11]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
[13]
Junfeng He, Shih-Fu Chang, Regunathan Radhakrishnan, and Claus Bauer. 2011. Compact hashing with joint optimization of search accuracy and time. In CVPR 2011. IEEE, 753–760.
[14]
Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, and Sung-Eui Yoon. 2012. Spherical hashing. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2957–2964.
[15]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning. 448–456.
[16]
Alexis Joly and Olivier Buisson. 2011. Random maximum margin hashing. In CVPR 2011. IEEE, 873–880.
[17]
Yoon Kim, Kelly Zhang, Alexander M Rush, Yann LeCun, 2017. Adversarially regularized autoencoders for generating discrete structures. arXiv preprint arXiv:1706.04223(2017).
[18]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[19]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).
[20]
Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in neural information processing systems. 3294–3302.
[21]
Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of massive datasets. Cambridge university press.
[22]
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2074–2081.
[23]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.
[24]
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644(2015).
[25]
Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
[26]
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969–978.
[27]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.
[28]
Gunnar Schröder, Maik Thiele, and Wolfgang Lehner. 2011. Setting goals and choosing metrics for recommender system evaluations. In UCERSTI2 Workshop at the 5th ACM Conference on Recommender Systems, Chicago, USA, Vol. 23. 53.
[29]
Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Ricardo Henao, and Lawrence Carin. 2018. NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2041–2050.
[30]
Malcolm Slaney, Yury Lifshits, and Junfeng He. 2012. Optimal parameters for locality-sensitive hashing. Proc. IEEE 100, 9 (2012), 2604–2623.
[31]
Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. 2017. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558(2017).
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[33]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. ACM, 1096–1103.
[34]
Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. arXiv preprint arXiv:1408.2927(2014).
[35]
Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, 2017. A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence (2017).
[36]
Qifan Wang, Dan Zhang, and Luo Si. 2013. Semantic hashing using tags and topic modeling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 213–222.
[37]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in neural information processing systems. 1753–1760.
[38]
D Zhang, J Wang, D Cai, and J Lu. 2010. Self-taught hashing for fast similarity search. In SIGIR’10: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 18–25.
[39]
Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710(2015).
[40]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649–657.
[41]
Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks. In AAAI. 1369–1375.
[42]
Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, and Lawrence Carin. 2017. Deconvolutional paragraph representation learning. In Advances in Neural Information Processing Systems. 4169–4179.

Cited By

View all
  • (2024)POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental DataProceedings of the ACM Web Conference 202410.1145/3589334.3645716(4470-4478)Online publication date: 13-May-2024
  • (2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
  • (2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
  • Show More Cited By

Index Terms

  1. Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Hashing
        2. adversarial training
        3. autoencoder
        4. deep learning.

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)20
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 11 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental DataProceedings of the ACM Web Conference 202410.1145/3589334.3645716(4470-4478)Online publication date: 13-May-2024
        • (2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
        • (2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
        • (2023)Unified Energy-Based Generative Network for Supervised Image HashingComputer Vision – ACCV 202210.1007/978-3-031-26351-4_32(527-543)Online publication date: 26-Feb-2023
        • (2022)Intra-category Aware Hierarchical Supervised Document HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3161807(1-1)Online publication date: 2022
        • (2022)One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00923(9437-9447)Online publication date: Jun-2022
        • (2021)LIRA: Learnable, Imperceptible and Robust Backdoor Attacks2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.01175(11946-11956)Online publication date: Oct-2021

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media