research-article

Free access

Matching Natural Language Sentences with Hierarchical Sentence Factorization

Authors:

Yu XuAuthors Info & Claims

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 1237 - 1246

https://doi.org/10.1145/3178876.3186022

Published: 10 April 2018 Publication History

All formats PDF

Abstract

Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorization---a technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).

References

[1]

Fritz Albregtsen et al. 2008. Statistical texture measures computed from gray level coocurrence matrices. Image processing laboratory, department of informatics, university of oslo Vol. 5 (2008).

[2]

Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 86--90.

Digital Library

[3]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2012. Abstract meaning representation (AMR) 1.0 specification Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle: ACL. 1533--1544.

[4]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. 178--186.

[5]

Petr Baudivs, Jan Pichl, Tomávs Vyskovcil, and Jan vSedivỳ. 2016. Sentence pair scoring: Towards unified framework for text comprehension. arXiv preprint arXiv:1603.06127 (2016).

[6]

Jonathan Berant and Percy Liang. 2014. Semantic Parsing via Paraphrasing. In ACL (1). 1415--1425.

[7]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research Vol. 3, Jan (2003), 993--1022.

Digital Library

[8]

Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics Vol. 19, 2 (1993), 263--311.

Digital Library

[9]

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity-Multilingual and Cross-lingual Focused Evaluation. arXiv preprint arXiv:1708.00055 (2017).

[10]

Franccois Chollet et almbox. 2015. Keras. https://github.com/fchollet/keras. (2015).

[11]

Marco Damonte, Shay B Cohen, and Giorgio Satta. 2016. An incremental parser for abstract meaning representation. arXiv preprint arXiv:1608.06111 (2016).

[12]

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science Vol. 41, 6 (1990), 391.

[13]

Jeffrey Flanigan, Sam Thomson, Jaime G Carbonell, Chris Dyer, and Noah A Smith. 2014. A discriminative graph-based parser for the abstract meaning representation. (2014).

[14]

Ralph Grishman. 1997. Information extraction: Techniques and challenges. In Information extraction a multidisciplinary approach to an emerging information technology. Springer, 10--27.

Digital Library

[15]

Hua He and Jimmy J Lin. 2016. Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement. In HLT-NAACL. 937--948.

[16]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences Advances in neural information processing systems. 2042--2050.

Digital Library

[17]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).

[18]

Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.

[19]

Philip A Knight. 2008. The Sinkhorn--Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl. Vol. 30, 1 (2008), 261--275.

Digital Library

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.

Digital Library

[21]

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International Conference on Machine Learning. 957--966.

Digital Library

[22]

Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. LREC. 216--223.

[23]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[24]

Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures for Learning Sentence Similarity. AAAI. 2786--2792.

Digital Library

[25]

Paul Neculoiu, Maarten Versteegh, Mihai Rotaru, and Textkernel BV Amsterdam. 2016. Learning Text Similarity with Siamese Recurrent Networks. ACL 2016 (2016), 148.

[26]

Georgios Paltoglou and Mike Thelwall. 2010. A study of information retrieval weighting schemes for sentiment analysis Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 1386--1395.

Digital Library

[27]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching as Image Recognition. In AAAI. 2793--2799.

Digital Library

[28]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[29]

Luca Ponzanelli, Andrea Mocci, and Michele Lanza. 2015. Summarizing complex development artifacts by mining heterogeneous data Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, 401--405.

Digital Library

[30]

Nima Pourdamghani, Yang Gao, Ulf Hermjakob, and Kevin Knight. 2014. Aligning English Strings with Abstract Meaning Representation Graphs. EMNLP. 425--429.

[31]

Chris Quirk, Chris Brockett, and William Dolan. 2004. Monolingual machine translation for paraphrase generation Proceedings of the 2004 conference on empirical methods in natural language processing.

[32]

Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 232--241.

Digital Library

[33]

Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 2000. The earth mover's distance as a metric for image retrieval. International journal of computer vision Vol. 40, 2 (2000), 99--121.

Digital Library

[34]

Alexandre Salle, Marco Idiart, and Aline Villavicencio. 2016. Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory. arXiv preprint arXiv:1606.01283 (2016).

[35]

Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373--382.

Digital Library

[36]

Yang Shao. 2017. HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 130--133.

[37]

Bing Su and Gang Hua. 2017. Order-preserving wasserstein distance for sequence matching Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1049--1057.

[38]

Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association.

[39]

Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers. ACL (2). 857--862.

[40]

Shuohang Wang and Jing Jiang. 2016. A Compare-Aggregate Model for Matching Text Sequences. arXiv preprint arXiv:1611.01747 (2016).

[41]

Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. arXiv preprint arXiv:1702.03814 (2017).

Digital Library

[42]

Wikipedia. 2017. Spearman's rank correlation coefficient -- Wikipedia, The Free Encyclopedia. (2017). https://en.wikipedia.org/w/index.php?title=Spearman%27s_rank_correlation_coefficient&oldid=801404677 {Online; accessed 31-October-2017}.

[43]

Ho Chung Wu, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok. 2008. Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS) Vol. 26, 3 (2008), 13.

Digital Library

[44]

Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632 (2014).

Cited By

Mao KZhao Q(2024)PIM-ST: a New Paraphrase Identification Model Incorporating Sequence and Topic Information2024 4th International Symposium on Computer Technology and Information Science (ISCTIS)10.1109/ISCTIS63324.2024.10699008(894-898)Online publication date: 12-Jul-2024
https://doi.org/10.1109/ISCTIS63324.2024.10699008
Li RCheng LWang DTan J(2023)Siamese BERT Architecture Model with attention mechanism for Textual Semantic SimilarityMultimedia Tools and Applications10.1007/s11042-023-15509-482:30(46673-46694)Online publication date: 2-May-2023
https://doi.org/10.1007/s11042-023-15509-4
Zhao GZhang JDu DGao QLin SXiao XWu Y(2023)Alignment-Aware Word DistanceAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-33374-3_33(418-429)Online publication date: 27-May-2023
https://doi.org/10.1007/978-3-031-33374-3_33
Show More Cited By

Index Terms

Matching Natural Language Sentences with Hierarchical Sentence Factorization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Lexical semantics

Recommendations

Sentence boundary disambiguation for Indonesian language
iiWAS '17: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services

Sentence boundary detection is essential for natural language processing (NLP). Sentence boundary detection in the Indonesian language has lots of problems, which includes punctuation, abbreviation, and character in the bracket. The disambiguation ...
Introduction to Chinese Natural Language Processing
PAMR: Persian Abstract Meaning Representation Corpus
One of the most used and well-known semantic representation models is Abstract Meaning Representation (AMR). This representation has had numerous applications in natural language processing tasks in recent years. Currently, for English and Chinese ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '18: Proceedings of the 2018 World Wide Web Conference

April 2018

2000 pages

ISBN:9781450356398

General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 10 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
1,622
Total Downloads

Downloads (Last 12 months)409
Downloads (Last 6 weeks)47

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mao KZhao Q(2024)PIM-ST: a New Paraphrase Identification Model Incorporating Sequence and Topic Information2024 4th International Symposium on Computer Technology and Information Science (ISCTIS)10.1109/ISCTIS63324.2024.10699008(894-898)Online publication date: 12-Jul-2024
https://doi.org/10.1109/ISCTIS63324.2024.10699008
Li RCheng LWang DTan J(2023)Siamese BERT Architecture Model with attention mechanism for Textual Semantic SimilarityMultimedia Tools and Applications10.1007/s11042-023-15509-482:30(46673-46694)Online publication date: 2-May-2023
https://doi.org/10.1007/s11042-023-15509-4
Zhao GZhang JDu DGao QLin SXiao XWu Y(2023)Alignment-Aware Word DistanceAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-33374-3_33(418-429)Online publication date: 27-May-2023
https://doi.org/10.1007/978-3-031-33374-3_33
Grover KAngara SAkhtar MChakraborty TKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Public wisdom matters! discourse-aware hyperbolic fourier co-attention for social-text classificationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600954(9417-9431)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600954
Yang JGuo WLiu BYu YWang CLuo JKong LNiu DWen ZZhang ARangwala H(2022)TAG: Toward Accurate Social Media Content Tagging with a Concept GraphProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539077(4332-4341)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539077
Zhao YLiu DWan CLiu XQiu XNie J(2022)Find Supports for the Post about Mental Issues: More Than Semantic MatchingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/350837321:6(1-14)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3508373
Li CTong CNiu DJiang BZuo XCheng LXiong JYang J(2021)Similarity Embedding Networks for Robust Human Activity RecognitionACM Transactions on Knowledge Discovery from Data10.1145/344802115:6(1-17)Online publication date: 19-May-2021
https://dl.acm.org/doi/10.1145/3448021
Chen YZhang YWang JWu JXing C(2021)Efficient EMD-based Similarity Search via Batch Pruning and Incremental ComputationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3100566(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3100566
Li JJia WNie FYou HHao Y(2021)An Approach for Buyer Name Normalization in Pharmacy Sales DataIEEE Access10.1109/ACCESS.2021.30930289(93990-93997)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3093028
Ni J(2021)A Medical Service Application Based on 3D-CNN and Knowledge GraphJournal of Physics: Conference Series10.1088/1742-6596/2078/1/0120482078:1(012048)Online publication date: 1-Nov-2021
https://doi.org/10.1088/1742-6596/2078/1/012048
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents