research-article

Improved Cross-Lingual Question Retrieval for Community Question Answering

Authors:

Andreas Rücklé,

Krishnkant Swarnkar,

Iryna GurevychAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 3179 - 3186

https://doi.org/10.1145/3308558.3313502

Published: 13 May 2019 Publication History

Abstract

We perform cross-lingual question retrieval in community question answering (cQA), i.e., we retrieve similar questions for queries that are given in another language. The standard approach to cross-lingual information retrieval, which is to automatically translate the query to the target language and continue with a monolingual retrieval model, typically falls short in cQA due to translation errors. This is even more the case for specialized domains such as in technical cQA, which we explore in this work. To remedy, we propose two extensions to this approach that improve cross-lingual question retrieval: (1) we enhance an NMT model with monolingual cQA data to improve the translation quality, and (2) we improve the robustness of a state-of-the-art neural question retrieval model to common translation errors by adding back-translations during training. Our results show that we achieve substantial improvements over the baseline approach and considerably close the gap to a setup where we have access to an external commercial machine translation service (i.e., Google Translate), which is often not the case in many practical scenarios. Our source code and data is publicly available.1

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR 2015).

[2]

Delphine Bernhard and Iryna Gurevych. 2009. Combining lexical semantic resources with question & answer archives for translation-based answer finding. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 728-736.

Digital Library

[3]

Nicola Bertoldi and Marcello Federico. 2009. Domain Adaptation for Statistical Machine Translation with Monolingual Resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 182-189. http://aclweb.org/anthology/W09-0432

Digital Library

[4]

Daniele Bonadiman, Antonio Uva, and Alessandro Moschitti. 2017. Effective shared representations with Multitask Learning for Community Question Answering. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017). Association for Computational Linguistics, 726-732. http://aclweb.org/anthology/E17-2115

[5]

Gosse Bouma, Jori Mur, Gertjan van Noord, Lonneke van der Plas, and Jörg Tiedemann. 2008. Question Answering with Joost at CLEF 2008. In 9th Workshop of the Cross-Language Evaluation Forum (CLEF 2008). 257-260.

[6]

Xin Cao, Gao Cong, Bin Cui, Christian S Jensen, and Quan Yuan. 2012. Approaches to exploring category information for question retrieval in community question-answer archives. ACM Transactions on Information Systems (TOIS)30, 2 (2012), 7.

Digital Library

[7]

Chenhui Chu and Rui Wang. 2018. A Survey of Domain Adaptation for Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). Association for Computational Linguistics, 1304-1319. http://aclweb.org/anthology/C18-1111

[8]

Giovanni Da San Martino, Alberto Barrón Cede&ntiled;o, Salvatore Romeo, Antonio Uva, and Alessandro Moschitti. 2016. Learning to re-rank questions in community question answering using advanced features. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). ACM, 1997-2000.

Digital Library

[9]

Li Dong, Jonathan Mallinson, Siva Reddy, and Mirella Lapata. 2017. Learning to Paraphrase for Question Answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). Association for Computational Linguistics, 875-886.

[10]

Cicero Dos Santos, Luciano Barbosa, Dasha Bogdanova, and Bianca Zadrozny. 2015. Learning Hybrid Representations to Retrieve Semantically Equivalent Questions. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 694-699.

[11]

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. arXiv preprint arXiv:1808.09381(2018).

[12]

M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, and Marcello Federico. 2017. Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017). Association for Computational Linguistics, 280-284. http://aclweb.org/anthology/E17-2045

[13]

Simone Filice, Danilo Croce, Alessandro Moschitti, and Roberto Basili. 2016. KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, 1116-1123.

[14]

Pamela Forner, Anselmo Pe&ntiled;as, Eneko Agirre, I&ntiled;aki Alegria, Corina Forascu, Nicolas Moreau, Petya Osenova, Prokopis Prokopidis, Paulo Rocha, Bogdan Sacaleanu, 2008. Overview of the clef 2008 multilingual question answering track. In 9th Workshop of the Cross-Language Evaluation Forum (CLEF 2008). 262-295.

Digital Library

[15]

Jonas Gehring, Michael Auli, David Grangier, and Yann N Dauphin. 2016. A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344(2016).

[16]

Deepak Gupta, Rajkumar Pujari, Asif Ekbal, Pushpak Bhattacharyya, Anutosh Maitra, Tom Jain, and Shubhashis Sengupta. 2018. Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). Association for Computational Linguistics, 499-513. http://aclweb.org/anthology/C18-1042

[17]

Sven Hartrumpf, Ingo Glöckner, and Johannes Leveling. 2008. Efficient question answering with question decomposition and multiple answer streams. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 421-428.

Digital Library

[18]

Felix Hieber and Stefan Riezler. 2015. Bag-of-Words Forced Decoding for Cross-Lingual Information Retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2015). Association for Computational Linguistics, 1172-1182.

[19]

Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2018). Association for Computational Linguistics, 1875-1885.

[20]

Jiwoon Jeon, W Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM 2005). 84-90.

Digital Library

[21]

Zongcheng Ji, Fei Xu, Bin Wang, and Ben He. 2012. Question-answer Topic Model for Question Retrieval in Community Question Answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management(CIKM '12). ACM, New York, NY, USA, 2471-2474.

Digital Library

[22]

Shafiq Joty, Preslav Nakov, Lluís Màrquez, and Israa Jaradat. 2017. Cross-language Learning with Adversarial Neural Networks. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Association for Computational Linguistics, 226-237.

[23]

Patrik Lambert, Holger Schwenk, Christophe Servan, and Sadaf Abdul-Rauf. 2011. Investigations on Translation Model Adaptation Using Monolingual Data. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 284-293. http://aclweb.org/anthology/W11-2132

Digital Library

[24]

Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Katerina Tymoshenko, Alessandro Moschitti, and Lluis Marquez. 2016. Semi-supervised Question Retrieval with Gated Convolutions. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2016), 1279-1289.

[25]

Chuan-Jie Lin and Yu-Min Kuo. 2010. Description of the NTOU Complex QA System. In NTCIR-8 Workshop. 47-54.

[26]

Minh-Thang Luong and Christopher D Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of IWSLT. 76-79.

[27]

Giovanni Da San Martino, Salvatore Romeo, Alberto Barrón-Cede&ntiled;o, Shafiq R. Joty, Lluís Màrquez i Villodre, Alessandro Moschitti, and Preslav Nakov. 2017. Cross-Language Question Re-Ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017).

Digital Library

[28]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (NIPS 2013). 3111-3119.

Digital Library

[29]

Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, 2008. Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access (NTCIR-8).

[30]

Preslav Nakov, Doris Hoogeveen, Lluís Màrquez, Alessandro Moschitti, Hamdy Mubarak, Timothy Baldwin, and Karin Verspoor. 2017. SemEval-2017 Task 3: Community Question Answering.

[31]

Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de Buy Wenniger, and Peyman Passban. 2018. Investigating Backtranslation in Neural Machine Translation. arXiv preprint (2018). http://arxiv.org/abs/1804.06189

[32]

Amir Pouran Ben Veyseh. 2016. Cross-Lingual Question Answering Using Common Semantic Space. In Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, 15-19.

[33]

Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 232-241.

Digital Library

[34]

Salvatore Romeo, Giovanni Da San Martino, Alberto Barrón-Cedeno, Alessandro Moschitti, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Mitra Mohtarami, and James Glass. 2016. Neural attention for learning to rank questions in community question answering. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016). 1734-1745.

[35]

Andreas Rückle´, Steffen Eger, Maxime Peyrard, and Iryna Gurevych. 2018. Concatenated Power Mean Embeddings as Universal Cross-Lingual Sentence Representations. arXiv (2018). https://arxiv.org/abs/1803.01400

[36]

Shadi Saleh and Pavel Pecina. 2016. Reranking Hypotheses of Machine-Translated Queries for Cross-Lingual Information Retrieval. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer International Publishing, 54-66.

[37]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Association for Computational Linguistics, 86-96.

[38]

Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. Adversarial Domain Adaptation for Duplicate Question Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). Association for Computational Linguistics, 1056-1063. http://aclweb.org/anthology/D18-1131

[39]

Ian Soboroff, Kira Griffitt, and Stephanie Strassel. 2016. The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2016). ACM, 713-716.

Digital Library

[40]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS 2014). 3104-3112.

Digital Library

[41]

Ferhan Ture and Elizabeth Boschee. 2016. Learning to Translate for Multilingual Question Answering. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). Association for Computational Linguistics, 573-584.

[42]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS 2017). 5998-6008.

Digital Library

[43]

John Wieting, Jonathan Mallinson, and Kevin Gimpel. 2017. Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). Association for Computational Linguistics, 274-285. http://aclweb.org/anthology/D17-1026

[44]

Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft. 2008. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2008). ACM, 475-482.

Digital Library

[45]

Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. International Conference on Learning Representations (ICLR 2018) (2018).

[46]

Kai Zhang, Wei Wu, Haocheng Wu, Zhoujun Li, and Ming Zhou. 2014. Question Retrieval with High Quality Answers in Community Question Answering. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM 2014) (2014), 371-380.

Digital Library

[47]

Minghua Zhang and Yunfang Wu. 2018. An Unsupervised Model with Attention Autoencoders for Question Retrieval. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence (AAAI 2018). 4978-4986.

[48]

Wei Emma Zhang, Quan Z Sheng, Jey Han Lau, Ermyas Abebe, and Wenjie Ruan. 2018. Duplicate Detection in Programming Question Answering Communities. ACM Transactions on Internet Technology (TOIT)18, 3 (2018), 37.

Digital Library

[49]

Guangyou Zhou, Li Cai, Jun Zhao, and Kang Liu. 2011. Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2011). Association for Computational Linguistics, 653-662. http://aclweb.org/anthology/P11-1066

Digital Library

Cited By

Andreasen TBordogna GTré GKacprzyk JLarsen HZadrożny S(2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.datak.2023.102246
Hadi Mogavi RHaq EGujar SHui PMa X(2022)More Gamification Is Not Always BetterProceedings of the ACM on Human-Computer Interaction10.1145/35555536:CSCW2(1-32)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3555553
Gao SZhang YWang YDong YChen XZhao DYan RSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)HeteroQAProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498378(307-315)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498378
Show More Cited By

Recommendations

Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
Abstract
In today’s digital world people are keen on finding the knowledge they need by surfing the internet to find the answers to their questions. To this aim, many Community Question Answering (CQA) systems are established, in which people can ask their ...
Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization

Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been ...
Learning Distributed Representations of Data in Community Question Answering for Question Retrieval
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

We study the problem of question retrieval in community question answering (CQA). The biggest challenge within this task is lexical gaps between questions since similar questions are usually expressed with different but semantically related words. To ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
438
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Andreasen TBordogna GTré GKacprzyk JLarsen HZadrożny S(2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.datak.2023.102246
Hadi Mogavi RHaq EGujar SHui PMa X(2022)More Gamification Is Not Always BetterProceedings of the ACM on Human-Computer Interaction10.1145/35555536:CSCW2(1-32)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3555553
Gao SZhang YWang YDong YChen XZhao DYan RSelcuk Candan KLiu HAkoglu LLuna Dong XTang J(2022)HeteroQAProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498378(307-315)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1145/3488560.3498378
Ullah IKhusro S(2022)On the analysis and evaluation of information retrieval models for social book searchMultimedia Tools and Applications10.1007/s11042-022-13417-782:5(6431-6478)Online publication date: 27-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-13417-7
Yu PFei HLi P(2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449830
Gupta DSuman SEkbal A(2021)Hierarchical deep multi-modal network for medical visual question answeringExpert Systems with Applications10.1016/j.eswa.2020.113993164(113993)Online publication date: Feb-2021
https://doi.org/10.1016/j.eswa.2020.113993
Loginova EVaranasi SNeumann G(2021)Towards End-to-End Multilingual Question AnsweringInformation Systems Frontiers10.1007/s10796-020-09996-123:1(227-241)Online publication date: 1-Feb-2021
https://dl.acm.org/doi/10.1007/s10796-020-09996-1
HajiAminShirazi SMomtazi S(2020)Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answeringMachine Translation10.1007/s10590-020-09257-734:4(287-303)Online publication date: 1-Dec-2020
https://dl.acm.org/doi/10.1007/s10590-020-09257-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents