research-article

Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder

Authors:

Chandan K. ReddyAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 684 - 694

https://doi.org/10.1145/3366423.3380150

Published: 20 April 2020 Publication History

Abstract

Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.

References

[1]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875(2017).

[2]

Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. 2013. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems. 899–907.

[3]

David Berthelot, Colin Raffel, Aurko Roy, and Ian Goodfellow. 2018. Understanding and improving interpolation in autoencoders via an adversarial regularizer. arXiv preprint arXiv:1807.07543(2018).

[4]

Rainer E Burkard, Mauro Dell’Amico, and Silvano Martello. 2009. Assignment problems. Springer.

[5]

Suthee Chaidaroon and Yi Fang. 2017. Variational deep semantic hashing for text documents. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 75–84.

Digital Library

[6]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724–1734.

[7]

Khoa D Doan, Pranjul Yadav, and Chandan K Reddy. 2019. Adversarial Factorization Autoencoder for Look-alike Modeling. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2803–2812.

Digital Library

[8]

Wei Dong, Zhe Wang, William Josephson, Moses Charikar, and Kai Li. 2008. Modeling LSH for performance tuning. In Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 669–678.

Digital Library

[9]

Tiezheng Ge, Kaiming He, and Jian Sun. 2014. Graph cuts for supervised binary coding. In European Conference on Computer Vision. Springer, 250–264.

[10]

Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12(2013), 2916–2929.

Digital Library

[11]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.

Digital Library

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.

[13]

Junfeng He, Shih-Fu Chang, Regunathan Radhakrishnan, and Claus Bauer. 2011. Compact hashing with joint optimization of search accuracy and time. In CVPR 2011. IEEE, 753–760.

Digital Library

[14]

Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, and Sung-Eui Yoon. 2012. Spherical hashing. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2957–2964.

[15]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning. 448–456.

Digital Library

[16]

Alexis Joly and Olivier Buisson. 2011. Random maximum margin hashing. In CVPR 2011. IEEE, 873–880.

Digital Library

[17]

Yoon Kim, Kelly Zhang, Alexander M Rush, Yann LeCun, 2017. Adversarially regularized autoencoders for generating discrete structures. arXiv preprint arXiv:1706.04223(2017).

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[19]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).

[20]

Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in neural information processing systems. 3294–3302.

[21]

Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of massive datasets. Cambridge university press.

Digital Library

[22]

Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2074–2081.

Digital Library

[23]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.

[24]

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644(2015).

[25]

Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.

[26]

Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969–978.

Digital Library

[27]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.

[28]

Gunnar Schröder, Maik Thiele, and Wolfgang Lehner. 2011. Setting goals and choosing metrics for recommender system evaluations. In UCERSTI2 Workshop at the 5th ACM Conference on Recommender Systems, Chicago, USA, Vol. 23. 53.

[29]

Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Ricardo Henao, and Lawrence Carin. 2018. NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2041–2050.

[30]

Malcolm Slaney, Yury Lifshits, and Junfeng He. 2012. Optimal parameters for locality-sensitive hashing. Proc. IEEE 100, 9 (2012), 2604–2623.

[31]

Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. 2017. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558(2017).

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[33]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. ACM, 1096–1103.

Digital Library

[34]

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. arXiv preprint arXiv:1408.2927(2014).

[35]

Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, 2017. A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence (2017).

[36]

Qifan Wang, Dan Zhang, and Luo Si. 2013. Semantic hashing using tags and topic modeling. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 213–222.

Digital Library

[37]

Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in neural information processing systems. 1753–1760.

[38]

D Zhang, J Wang, D Cai, and J Lu. 2010. Self-taught hashing for fast similarity search. In SIGIR’10: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 18–25.

Digital Library

[39]

Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv preprint arXiv:1502.01710(2015).

[40]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649–657.

[41]

Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks. In AAAI. 1369–1375.

[42]

Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, and Lawrence Carin. 2017. Deconvolutional paragraph representation learning. In Advances in Neural Information Processing Systems. 4169–4179.

Cited By

Zhan YLuo XChen ZWang YWei YXu XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental DataProceedings of the ACM Web Conference 202410.1145/3589334.3645716(4470-4478)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645716
He LHuang ZLiu JChen EWang FSha JWang SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645440
He LHuang ZChen ELiu QTong SWang HLian DWang S(2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3570725
Show More Cited By

Index Terms

Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder

Index terms have been assigned to the content through auto-classification.

Recommendations

A lightweight unsupervised adversarial detector based on autoencoder and isolation forest
Abstract
Although deep neural networks (DNNs) have performed well on many perceptual tasks, they are vulnerable to adversarial examples that are generated by adding slight but maliciously crafted perturbations to benign images. Adversarial detection is an ...
Highlights
- We observe that adversarial detection is sensitive to the perturbation level.
- We train a shallow autoencoder to find two key features from adversarial examples.
- We propose a lightweight and unsupervised adversarial detector.
A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
Abstract
Deep neural networks (DNNs) are vulnerable to adversarial attacks that generate adversarial examples by adding small perturbations to the clean images. To combat adversarial attacks, the two main defense methods used are denoising and adversarial ...
Deep Semantic Hashing with Generative Adversarial Networks
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Hashing has been a widely-adopted technique for nearest neighbor search in large-scale image retrieval tasks. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, the cost of annotating data is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
418
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)2

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhan YLuo XChen ZWang YWei YXu XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental DataProceedings of the ACM Web Conference 202410.1145/3589334.3645716(4470-4478)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645716
He LHuang ZLiu JChen EWang FSha JWang SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645440
He LHuang ZChen ELiu QTong SWang HLian DWang S(2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3570725
Doan KBadirli SReddy C(2023)Unified Energy-Based Generative Network for Supervised Image HashingComputer Vision – ACCV 202210.1007/978-3-031-26351-4_32(527-543)Online publication date: 26-Feb-2023
https://doi.org/10.1007/978-3-031-26351-4_32
Guo JMao XWei WHuang H(2022)Intra-category Aware Hierarchical Supervised Document HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3161807(1-1)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3161807
Doan KYang PLi P(2022)One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00923(9437-9447)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.00923
Doan KLao YZhao WLi P(2021)LIRA: Learnable, Imperceptible and Robust Backdoor Attacks2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.01175(11946-11956)Online publication date: Oct-2021
https://doi.org/10.1109/ICCV48922.2021.01175

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents