Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SANTM: Efficient Self-attention-driven Network for Text Matching

Published: 29 November 2021 Publication History

Abstract

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

References

[1]
Tamer Alkhouli, Gabriel Bretschner, and Hermann Ney. 2018. On the alignment problem in multi-head attention-based neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers. 177–185.
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450.
[3]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 632–642.
[4]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.
[5]
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, and Si Wei. 2018. Neural natural language inference models enhanced with external knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2406–2417.
[6]
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1657–1668.
[7]
Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 551–561.
[8]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug.2011), 2493–2537.
[9]
Sahil Garg, Amritpal Singh, Shalini Batra, Neeraj Kumar, and Laurence T. Yang. 2018. UAV-empowered edge computing environment for cyber-threat detection in smart vehicles. IEEE Netw. 32, 3 (2018), 42–51.
[10]
Sahil Garg, Amritpal Singh, Kuljeet Kaur, Gagangeet Singh Aujla, Shalini Batra, Neeraj Kumar, and Mohammad S. Obaidat. 2019. Edge computing-based security framework for big data analytics in VANETs. IEEE Netw. 33, 2 (2019), 72–81.
[11]
Reza Ghaeini, Sadid A. Hasan, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Fern, and Oladimeji Farri. 2018. DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1460–1469.
[12]
Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the International Conference on Learning Representations.
[13]
Hua He and Jimmy Lin. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 937–948.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[15]
Kuljeet Kaur, Sahil Garg, Gagangeet Singh Aujla, Neeraj Kumar, Joel JPC Rodrigues, and Mohsen Guizani. 2018. Edge computing in the industrial internet of things environment: Software-defined-networks-based edge-cloud interplay. IEEE Commun. Mag. 56, 2 (2018), 44–51.
[16]
Seonhoon Kim, Inho Kang, and Nojun Kwak. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6586–6593.
[17]
En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19, 1 (2019), 447–457.
[18]
Jianquan Li, Renfen Hu, Xiaokang Liu, Prayag Tiwari, Hari Mohan Pandey, Wei Chen, Benyou Wang, Yaohong Jin, and Kaicheng Yang. 2020. A distant supervision method based on paradigmatic relations for learning word embeddings. Neural Comput. Appl. 32, 12 (2020), 7759–7768.
[19]
Diane J. Litman and Scott Silliman. 2004. ITSPOKE: An intelligent tutoring spoken dialogue system. In Demonstration papers at HLT-NAACL’04. Association for Computational Linguistics, 5–8.
[20]
Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4487–4496.
[21]
Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:1605.09090. Retrieved from https://arxiv.org/abs/1605.09090.
[22]
Zhuang Ma and Michael Collins. 2018. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3698–3707.
[23]
Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning. 1727–1736.
[24]
Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-Value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1400–1409.
[25]
Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 130–136.
[26]
Tsendsuren Munkhdalai and Hong Yu. 2017. Neural semantic encoders. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 397–407.
[27]
Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 989–999.
[28]
Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2249–2255.
[29]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’18). 2227–2237.
[30]
Silvia Quarteroni and Suresh Manandhar. 2007. A chatbot-based interactive question answering system. Decalog. In Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue. 83–90.
[31]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. [n. d.]. Improving language understanding by generative pre-training.
[32]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.
[33]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. 1278–1286.
[34]
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiskỳ, and Phil Blunsom. 2015. Reasoning about entailment with neural attention. arXiv:1509.06664. Retrieved from https://arxiv.org/abs/1509.06664.
[35]
Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive pooling networks. arXiv:1602.03609. Retrieved from https://arxiv.org/abs/1602.03609.
[36]
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373–382.
[37]
Gautam Kishore Shahi, Imanol Bilbao, Elisa Capecci, Durgesh Nandini, Maria Choukri, and Nikola Kasabov. 2018. Analysis, classification and marker discovery of gene expression data with evolving spiking neural networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 517–527.
[38]
Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 464–468.
[39]
Gehui Shen, Yunlun Yang, and Zhi-Hong Deng. 2017. Inter-weighted alignment network for sentence pair modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1179–1189.
[40]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. arXiv preprint arXiv:1505.00387.
[41]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1556–1566.
[42]
Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, and Ming Zhou. 2018. Multiway attention networks for modeling sentence pairs. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 4411–4417.
[43]
Yi Tay, Anh Tuan Luu, and Siu Cheung Hui. 2018. Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1565–1575.
[44]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 583–591.
[45]
Prayag Tiwari and Massimo Melucci. 2018. Towards a quantum-inspired framework for binary classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1815–1818.
[46]
Prayag Tiwari and Massimo Melucci. 2019. Binary classifier inspired by quantum theory. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 10051–10052.
[47]
Prayag Tiwari and Massimo Melucci. 2019. Towards a quantum-inspired binary classifier. IEEE Access 7 (2019), 42354–42372.
[48]
Prayag Tiwari, Sagar Uprety, Shahram Dehdashti, and M Shamim Hossain. 2020. TermInformer: Unsupervised term mining and analysis in biomedical literature. Neural Comput. Appl. (2020), 1–14.
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[50]
Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. 2015. Order-embeddings of images and language. arXiv:1511.06361. Retrieved from https://arxiv.org/abs/1511.06361.
[51]
Dongsheng Wang, Prayag Tiwari, Sahil Garg, Hongyin Zhu, and Peter Bruza. 2020. Structural block driven enhanced convolutional neural representation for relation extraction. Appl. Soft Comput. 86 (2020), 105913.
[52]
Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 22–32.
[53]
Shuohang Wang and Jing Jiang. 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1442–1451.
[54]
Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 4144–4150.
[55]
Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 1340–1349.
[56]
Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, and Haiqing Chen. 2019. Simple and effective text matching with richer alignment features. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4699–4709.
[57]
Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013–2018.
[58]
Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, and Ruslan Salakhutdinov. 2016. Words or characters? fine-grained gating for reading comprehension. arXiv:1611.01724. Retrieved from https://arxiv.org/abs/1611.01724.
[59]
Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, and Kyomin Jung. 2019. A compare-aggregate model with latent clustering for answer selection. arXiv:1905.12897 (2019).
[60]
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. Springer, 818–833.
[61]
Zhuosheng Zhang, Yuwei Wu, Zuchao Li, Shexia He, Hai Zhao, Xi Zhou, and Xiang Zhou. 2018. I know what you want: Semantic learning for text comprehension. arXiv:1809.02794. Retrieved from https://arxiv.org/abs/1809.02794.
[62]
A. Rakhlin. 2016. Convolutional neural networks for sentence classification. GitHub.

Cited By

View all
  • (2024)Integrated Model Text Classification Based on Multineural NetworksElectronics10.3390/electronics1302045313:2(453)Online publication date: 22-Jan-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 22, Issue 3
August 2022
631 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3498359
  • Editor:
  • Ling Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 November 2021
Accepted: 01 September 2020
Revised: 01 September 2020
Received: 01 July 2020
Published in TOIT Volume 22, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Text matching
  2. deep learning
  3. attention mechanism

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie
  • Soonchunhyang University Research Fund
  • Academy of Finland
  • Business Finland
  • EU H2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Integrated Model Text Classification Based on Multineural NetworksElectronics10.3390/electronics1302045313:2(453)Online publication date: 22-Jan-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media