Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178876.3186022acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

Matching Natural Language Sentences with Hierarchical Sentence Factorization

Published: 10 April 2018 Publication History

Abstract

Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorization---a technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).

References

[1]
Fritz Albregtsen et al. 2008. Statistical texture measures computed from gray level coocurrence matrices. Image processing laboratory, department of informatics, university of oslo Vol. 5 (2008).
[2]
Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 86--90.
[3]
Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2012. Abstract meaning representation (AMR) 1.0 specification Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle: ACL. 1533--1544.
[4]
Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. 178--186.
[5]
Petr Baudivs, Jan Pichl, Tomávs Vyskovcil, and Jan vSedivỳ. 2016. Sentence pair scoring: Towards unified framework for text comprehension. arXiv preprint arXiv:1603.06127 (2016).
[6]
Jonathan Berant and Percy Liang. 2014. Semantic Parsing via Paraphrasing. In ACL (1). 1415--1425.
[7]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research Vol. 3, Jan (2003), 993--1022.
[8]
Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics Vol. 19, 2 (1993), 263--311.
[9]
Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity-Multilingual and Cross-lingual Focused Evaluation. arXiv preprint arXiv:1708.00055 (2017).
[10]
Franccois Chollet et almbox. 2015. Keras. https://github.com/fchollet/keras. (2015).
[11]
Marco Damonte, Shay B Cohen, and Giorgio Satta. 2016. An incremental parser for abstract meaning representation. arXiv preprint arXiv:1608.06111 (2016).
[12]
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science Vol. 41, 6 (1990), 391.
[13]
Jeffrey Flanigan, Sam Thomson, Jaime G Carbonell, Chris Dyer, and Noah A Smith. 2014. A discriminative graph-based parser for the abstract meaning representation. (2014).
[14]
Ralph Grishman. 1997. Information extraction: Techniques and challenges. In Information extraction a multidisciplinary approach to an emerging information technology. Springer, 10--27.
[15]
Hua He and Jimmy J Lin. 2016. Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement. In HLT-NAACL. 937--948.
[16]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences Advances in neural information processing systems. 2042--2050.
[17]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).
[18]
Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.
[19]
Philip A Knight. 2008. The Sinkhorn--Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl. Vol. 30, 1 (2008), 261--275.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[21]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International Conference on Machine Learning. 957--966.
[22]
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. LREC. 216--223.
[23]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[24]
Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures for Learning Sentence Similarity. AAAI. 2786--2792.
[25]
Paul Neculoiu, Maarten Versteegh, Mihai Rotaru, and Textkernel BV Amsterdam. 2016. Learning Text Similarity with Siamese Recurrent Networks. ACL 2016 (2016), 148.
[26]
Georgios Paltoglou and Mike Thelwall. 2010. A study of information retrieval weighting schemes for sentiment analysis Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 1386--1395.
[27]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching as Image Recognition. In AAAI. 2793--2799.
[28]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.
[29]
Luca Ponzanelli, Andrea Mocci, and Michele Lanza. 2015. Summarizing complex development artifacts by mining heterogeneous data Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, 401--405.
[30]
Nima Pourdamghani, Yang Gao, Ulf Hermjakob, and Kevin Knight. 2014. Aligning English Strings with Abstract Meaning Representation Graphs. EMNLP. 425--429.
[31]
Chris Quirk, Chris Brockett, and William Dolan. 2004. Monolingual machine translation for paraphrase generation Proceedings of the 2004 conference on empirical methods in natural language processing.
[32]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 232--241.
[33]
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 2000. The earth mover's distance as a metric for image retrieval. International journal of computer vision Vol. 40, 2 (2000), 99--121.
[34]
Alexandre Salle, Marco Idiart, and Aline Villavicencio. 2016. Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory. arXiv preprint arXiv:1606.01283 (2016).
[35]
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373--382.
[36]
Yang Shao. 2017. HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 130--133.
[37]
Bing Su and Gang Hua. 2017. Order-preserving wasserstein distance for sequence matching Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1049--1057.
[38]
Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association.
[39]
Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers. ACL (2). 857--862.
[40]
Shuohang Wang and Jing Jiang. 2016. A Compare-Aggregate Model for Matching Text Sequences. arXiv preprint arXiv:1611.01747 (2016).
[41]
Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. arXiv preprint arXiv:1702.03814 (2017).
[42]
Wikipedia. 2017. Spearman's rank correlation coefficient -- Wikipedia, The Free Encyclopedia. (2017). https://en.wikipedia.org/w/index.php?title=Spearman%27s_rank_correlation_coefficient&oldid=801404677 {Online; accessed 31-October-2017}.
[43]
Ho Chung Wu, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok. 2008. Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS) Vol. 26, 3 (2008), 13.
[44]
Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632 (2014).

Cited By

View all
  • (2024)PIM-ST: a New Paraphrase Identification Model Incorporating Sequence and Topic Information2024 4th International Symposium on Computer Technology and Information Science (ISCTIS)10.1109/ISCTIS63324.2024.10699008(894-898)Online publication date: 12-Jul-2024
  • (2023)Siamese BERT Architecture Model with attention mechanism for Textual Semantic SimilarityMultimedia Tools and Applications10.1007/s11042-023-15509-482:30(46673-46694)Online publication date: 2-May-2023
  • (2023)Alignment-Aware Word DistanceAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-33374-3_33(418-429)Online publication date: 27-May-2023
  • Show More Cited By

Index Terms

  1. Matching Natural Language Sentences with Hierarchical Sentence Factorization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '18: Proceedings of the 2018 World Wide Web Conference
    April 2018
    2000 pages
    ISBN:9781450356398
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 10 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. abstract meaning representation
    2. hierarchical sentence factorization
    3. ordered word mover's distance
    4. semantic matching
    5. sentence reordering

    Qualifiers

    • Research-article

    Conference

    WWW '18
    Sponsor:
    • IW3C2
    WWW '18: The Web Conference 2018
    April 23 - 27, 2018
    Lyon, France

    Acceptance Rates

    WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)409
    • Downloads (Last 6 weeks)47
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PIM-ST: a New Paraphrase Identification Model Incorporating Sequence and Topic Information2024 4th International Symposium on Computer Technology and Information Science (ISCTIS)10.1109/ISCTIS63324.2024.10699008(894-898)Online publication date: 12-Jul-2024
    • (2023)Siamese BERT Architecture Model with attention mechanism for Textual Semantic SimilarityMultimedia Tools and Applications10.1007/s11042-023-15509-482:30(46673-46694)Online publication date: 2-May-2023
    • (2023)Alignment-Aware Word DistanceAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-33374-3_33(418-429)Online publication date: 27-May-2023
    • (2022)Public wisdom matters! discourse-aware hyperbolic fourier co-attention for social-text classificationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600954(9417-9431)Online publication date: 28-Nov-2022
    • (2022)TAG: Toward Accurate Social Media Content Tagging with a Concept GraphProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539077(4332-4341)Online publication date: 14-Aug-2022
    • (2022)Find Supports for the Post about Mental Issues: More Than Semantic MatchingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/350837321:6(1-14)Online publication date: 3-Feb-2022
    • (2021)Similarity Embedding Networks for Robust Human Activity RecognitionACM Transactions on Knowledge Discovery from Data10.1145/344802115:6(1-17)Online publication date: 19-May-2021
    • (2021)Efficient EMD-based Similarity Search via Batch Pruning and Incremental ComputationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3100566(1-1)Online publication date: 2021
    • (2021)An Approach for Buyer Name Normalization in Pharmacy Sales DataIEEE Access10.1109/ACCESS.2021.30930289(93990-93997)Online publication date: 2021
    • (2021)A Medical Service Application Based on 3D-CNN and Knowledge GraphJournal of Physics: Conference Series10.1088/1742-6596/2078/1/0120482078:1(012048)Online publication date: 1-Nov-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media