research-article

SANTM: Efficient Self-attention-driven Network for Text Matching

Authors:

Amit Kumar Jaiswal,

Ilsun YouAuthors Info & Claims

ACM Transactions on Internet Technology (TOIT), Volume 22, Issue 3

Article No.: 55, Pages 1 - 21

https://doi.org/10.1145/3426971

Published: 29 November 2021 Publication History

Abstract

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

References

[1]

Tamer Alkhouli, Gabriel Bretschner, and Hermann Ney. 2018. On the alignment problem in multi-head attention-based neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers. 177–185.

[2]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450.

[3]

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 632–642.

[4]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.

[5]

Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, and Si Wei. 2018. Neural natural language inference models enhanced with external knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2406–2417.

[6]

Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1657–1668.

[7]

Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 551–561.

[8]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug.2011), 2493–2537.

Digital Library

[9]

Sahil Garg, Amritpal Singh, Shalini Batra, Neeraj Kumar, and Laurence T. Yang. 2018. UAV-empowered edge computing environment for cyber-threat detection in smart vehicles. IEEE Netw. 32, 3 (2018), 42–51.

[10]

Sahil Garg, Amritpal Singh, Kuljeet Kaur, Gagangeet Singh Aujla, Shalini Batra, Neeraj Kumar, and Mohammad S. Obaidat. 2019. Edge computing-based security framework for big data analytics in VANETs. IEEE Netw. 33, 2 (2019), 72–81.

[11]

Reza Ghaeini, Sadid A. Hasan, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Fern, and Oladimeji Farri. 2018. DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1460–1469.

[12]

Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the International Conference on Learning Representations.

[13]

Hua He and Jimmy Lin. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 937–948.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[15]

Kuljeet Kaur, Sahil Garg, Gagangeet Singh Aujla, Neeraj Kumar, Joel JPC Rodrigues, and Mohsen Guizani. 2018. Edge computing in the industrial internet of things environment: Software-defined-networks-based edge-cloud interplay. IEEE Commun. Mag. 56, 2 (2018), 44–51.

Digital Library

[16]

Seonhoon Kim, Inho Kang, and Nojun Kwak. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6586–6593.

Digital Library

[17]

En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19, 1 (2019), 447–457.

[18]

Jianquan Li, Renfen Hu, Xiaokang Liu, Prayag Tiwari, Hari Mohan Pandey, Wei Chen, Benyou Wang, Yaohong Jin, and Kaicheng Yang. 2020. A distant supervision method based on paradigmatic relations for learning word embeddings. Neural Comput. Appl. 32, 12 (2020), 7759–7768.

[19]

Diane J. Litman and Scott Silliman. 2004. ITSPOKE: An intelligent tutoring spoken dialogue system. In Demonstration papers at HLT-NAACL’04. Association for Computational Linguistics, 5–8.

Digital Library

[20]

Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4487–4496.

[21]

Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:1605.09090. Retrieved from https://arxiv.org/abs/1605.09090.

[22]

Zhuang Ma and Michael Collins. 2018. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3698–3707.

[23]

Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning. 1727–1736.

Digital Library

[24]

Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-Value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1400–1409.

[25]

Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 130–136.

[26]

Tsendsuren Munkhdalai and Hong Yu. 2017. Neural semantic encoders. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 397–407.

[27]

Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 989–999.

[28]

Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2249–2255.

[29]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’18). 2227–2237.

[30]

Silvia Quarteroni and Suresh Manandhar. 2007. A chatbot-based interactive question answering system. Decalog. In Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue. 83–90.

[31]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. [n. d.]. Improving language understanding by generative pre-training.

[32]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.

[33]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. 1278–1286.

Digital Library

[34]

Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiskỳ, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv:1509.06664. Retrieved from https://arxiv.org/abs/1509.06664.

[35]

Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive pooling networks. arXiv:1602.03609. Retrieved from https://arxiv.org/abs/1602.03609.

[36]

Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373–382.

Digital Library

[37]

Gautam Kishore Shahi, Imanol Bilbao, Elisa Capecci, Durgesh Nandini, Maria Choukri, and Nikola Kasabov. 2018. Analysis, classification and marker discovery of gene expression data with evolving spiking neural networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 517–527.

Digital Library

[38]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 464–468.

[39]

Gehui Shen, Yunlun Yang, and Zhi-Hong Deng. 2017. Inter-weighted alignment network for sentence pair modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1179–1189.

[40]

Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. arXiv preprint arXiv:1505.00387.

[41]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1556–1566.

[42]

Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, and Ming Zhou. 2018. Multiway attention networks for modeling sentence pairs. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 4411–4417.

Digital Library

[43]

Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. 2018. Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1565–1575.

[44]

Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 583–591.

Digital Library

[45]

Prayag Tiwari and Massimo Melucci. 2018. Towards a quantum-inspired framework for binary classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1815–1818.

Digital Library

[46]

Prayag Tiwari and Massimo Melucci. 2019. Binary classifier inspired by quantum theory. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 10051–10052.

Digital Library

[47]

Prayag Tiwari and Massimo Melucci. 2019. Towards a quantum-inspired binary classifier. IEEE Access 7 (2019), 42354–42372.

[48]

Prayag Tiwari, Sagar Uprety, Shahram Dehdashti, and M Shamim Hossain. 2020. TermInformer: Unsupervised term mining and analysis in biomedical literature. Neural Comput. Appl. (2020), 1–14.

[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.

Digital Library

[50]

Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. 2015. Order-embeddings of images and language. arXiv:1511.06361. Retrieved from https://arxiv.org/abs/1511.06361.

[51]

Dongsheng Wang, Prayag Tiwari, Sahil Garg, Hongyin Zhu, and Peter Bruza. 2020. Structural block driven enhanced convolutional neural representation for relation extraction. Appl. Soft Comput. 86 (2020), 105913.

Digital Library

[52]

Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 22–32.

[53]

Shuohang Wang and Jing Jiang. 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1442–1451.

[54]

Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 4144–4150.

Digital Library

[55]

Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 1340–1349.

[56]

Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, and Haiqing Chen. 2019. Simple and effective text matching with richer alignment features. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4699–4709.

[57]

Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013–2018.

[58]

Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, and Ruslan Salakhutdinov. 2016. Words or characters? fine-grained gating for reading comprehension. arXiv:1611.01724. Retrieved from https://arxiv.org/abs/1611.01724.

[59]

Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, and Kyomin Jung. 2019. A compare-aggregate model with latent clustering for answer selection. arXiv:1905.12897 (2019).

Digital Library

[60]

Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. Springer, 818–833.

[61]

Zhuosheng Zhang, Yuwei Wu, Zuchao Li, Shexia He, Hai Zhao, Xi Zhou, and Xiang Zhou. 2018. I know what you want: Semantic learning for text comprehension. arXiv:1809.02794. Retrieved from https://arxiv.org/abs/1809.02794.

[62]

A. Rakhlin. 2016. Convolutional neural networks for sentence classification. GitHub.

Cited By

Hu WXiong JWang NLiu FKong YYang C(2024)Integrated Model Text Classification Based on Multineural NetworksElectronics10.3390/electronics1302045313:2(453)Online publication date: 22-Jan-2024
https://doi.org/10.3390/electronics13020453

Index Terms

SANTM: Efficient Self-attention-driven Network for Text Matching
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Retrieval tasks and goals
      1. Information extraction
      2. Question answering

Recommendations

A simple and efficient text matching model based on deep interaction
Highlights
- We propose a novel model, namely Deep Interaction Text Matching (DITM).
- The proposed model can well capture the interaction information.
- This approach outperforms most of the state-of-the-art methods on multiple tasks.
- The ...
Abstract
In recent years, text matching has gained increasing research focus and shown great improvements. However, due to the long-distance dependency and polysemy, existing text matching models cannot effectively capture the contextual and implicit ...
Multi-Level Matching Networks for Text Matching
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Text matching aims to establish the matching relationship between two texts. It is an important operation in some information retrieval related tasks such as question duplicate detection, question answering, and dialog systems. Bidirectional long short ...
User-Video Co-Attention Network for Personalized Micro-video Recommendation
WWW '19: The World Wide Web Conference

With the increasing popularity of micro-video sharing where people shoot short-videos effortlessly and share their daily stories on social media platforms, the micro-video recommendation has attracted extensive research efforts to provide users with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology

ACM Transactions on Internet Technology Volume 22, Issue 3

August 2022

631 pages

ISSN:1533-5399

EISSN:1557-6051

DOI:10.1145/3498359

Editor:
Ling Liu
Georgia Institute of Technology, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 November 2021

Accepted: 01 September 2020

Revised: 01 September 2020

Received: 01 July 2020

Published in TOIT Volume 22, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie
Soonchunhyang University Research Fund
Academy of Finland
Business Finland
EU H2020

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
505
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu WXiong JWang NLiu FKong YYang C(2024)Integrated Model Text Classification Based on Multineural NetworksElectronics10.3390/electronics1302045313:2(453)Online publication date: 22-Jan-2024
https://doi.org/10.3390/electronics13020453

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents