Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3308558.3313502acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Improved Cross-Lingual Question Retrieval for Community Question Answering

Published: 13 May 2019 Publication History

Abstract

We perform cross-lingual question retrieval in community question answering (cQA), i.e., we retrieve similar questions for queries that are given in another language. The standard approach to cross-lingual information retrieval, which is to automatically translate the query to the target language and continue with a monolingual retrieval model, typically falls short in cQA due to translation errors. This is even more the case for specialized domains such as in technical cQA, which we explore in this work. To remedy, we propose two extensions to this approach that improve cross-lingual question retrieval: (1) we enhance an NMT model with monolingual cQA data to improve the translation quality, and (2) we improve the robustness of a state-of-the-art neural question retrieval model to common translation errors by adding back-translations during training. Our results show that we achieve substantial improvements over the baseline approach and considerably close the gap to a setup where we have access to an external commercial machine translation service (i.e., Google Translate), which is often not the case in many practical scenarios. Our source code and data is publicly available.1

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR 2015).
[2]
Delphine Bernhard and Iryna Gurevych. 2009. Combining lexical semantic resources with question & answer archives for translation-based answer finding. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 728-736.
[3]
Nicola Bertoldi and Marcello Federico. 2009. Domain Adaptation for Statistical Machine Translation with Monolingual Resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 182-189. http://aclweb.org/anthology/W09-0432
[4]
Daniele Bonadiman, Antonio Uva, and Alessandro Moschitti. 2017. Effective shared representations with Multitask Learning for Community Question Answering. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017). Association for Computational Linguistics, 726-732. http://aclweb.org/anthology/E17-2115
[5]
Gosse Bouma, Jori Mur, Gertjan van Noord, Lonneke van der Plas, and Jörg Tiedemann. 2008. Question Answering with Joost at CLEF 2008. In 9th Workshop of the Cross-Language Evaluation Forum (CLEF 2008). 257-260.
[6]
Xin Cao, Gao Cong, Bin Cui, Christian S Jensen, and Quan Yuan. 2012. Approaches to exploring category information for question retrieval in community question-answer archives. ACM Transactions on Information Systems (TOIS)30, 2 (2012), 7.
[7]
Chenhui Chu and Rui Wang. 2018. A Survey of Domain Adaptation for Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). Association for Computational Linguistics, 1304-1319. http://aclweb.org/anthology/C18-1111
[8]
Giovanni Da San Martino, Alberto Barrón Cede&ntiled;o, Salvatore Romeo, Antonio Uva, and Alessandro Moschitti. 2016. Learning to re-rank questions in community question answering using advanced features. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). ACM, 1997-2000.
[9]
Li Dong, Jonathan Mallinson, Siva Reddy, and Mirella Lapata. 2017. Learning to Paraphrase for Question Answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). Association for Computational Linguistics, 875-886.
[10]
Cicero Dos Santos, Luciano Barbosa, Dasha Bogdanova, and Bianca Zadrozny. 2015. Learning Hybrid Representations to Retrieve Semantically Equivalent Questions. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 694-699.
[11]
Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. arXiv preprint arXiv:1808.09381(2018).
[12]
M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, and Marcello Federico. 2017. Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017). Association for Computational Linguistics, 280-284. http://aclweb.org/anthology/E17-2045
[13]
Simone Filice, Danilo Croce, Alessandro Moschitti, and Roberto Basili. 2016. KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, 1116-1123.
[14]
Pamela Forner, Anselmo Pe&ntiled;as, Eneko Agirre, I&ntiled;aki Alegria, Corina Forascu, Nicolas Moreau, Petya Osenova, Prokopis Prokopidis, Paulo Rocha, Bogdan Sacaleanu, 2008. Overview of the clef 2008 multilingual question answering track. In 9th Workshop of the Cross-Language Evaluation Forum (CLEF 2008). 262-295.
[15]
Jonas Gehring, Michael Auli, David Grangier, and Yann N Dauphin. 2016. A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344(2016).
[16]
Deepak Gupta, Rajkumar Pujari, Asif Ekbal, Pushpak Bhattacharyya, Anutosh Maitra, Tom Jain, and Shubhashis Sengupta. 2018. Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). Association for Computational Linguistics, 499-513. http://aclweb.org/anthology/C18-1042
[17]
Sven Hartrumpf, Ingo Glöckner, and Johannes Leveling. 2008. Efficient question answering with question decomposition and multiple answer streams. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 421-428.
[18]
Felix Hieber and Stefan Riezler. 2015. Bag-of-Words Forced Decoding for Cross-Lingual Information Retrieval. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2015). Association for Computational Linguistics, 1172-1182.
[19]
Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2018). Association for Computational Linguistics, 1875-1885.
[20]
Jiwoon Jeon, W Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM 2005). 84-90.
[21]
Zongcheng Ji, Fei Xu, Bin Wang, and Ben He. 2012. Question-answer Topic Model for Question Retrieval in Community Question Answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management(CIKM '12). ACM, New York, NY, USA, 2471-2474.
[22]
Shafiq Joty, Preslav Nakov, Lluís Màrquez, and Israa Jaradat. 2017. Cross-language Learning with Adversarial Neural Networks. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Association for Computational Linguistics, 226-237.
[23]
Patrik Lambert, Holger Schwenk, Christophe Servan, and Sadaf Abdul-Rauf. 2011. Investigations on Translation Model Adaptation Using Monolingual Data. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 284-293. http://aclweb.org/anthology/W11-2132
[24]
Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Katerina Tymoshenko, Alessandro Moschitti, and Lluis Marquez. 2016. Semi-supervised Question Retrieval with Gated Convolutions. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2016), 1279-1289.
[25]
Chuan-Jie Lin and Yu-Min Kuo. 2010. Description of the NTOU Complex QA System. In NTCIR-8 Workshop. 47-54.
[26]
Minh-Thang Luong and Christopher D Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of IWSLT. 76-79.
[27]
Giovanni Da San Martino, Salvatore Romeo, Alberto Barrón-Cede&ntiled;o, Shafiq R. Joty, Lluís Màrquez i Villodre, Alessandro Moschitti, and Preslav Nakov. 2017. Cross-Language Question Re-Ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017).
[28]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (NIPS 2013). 3111-3119.
[29]
Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, 2008. Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access (NTCIR-8).
[30]
Preslav Nakov, Doris Hoogeveen, Lluís Màrquez, Alessandro Moschitti, Hamdy Mubarak, Timothy Baldwin, and Karin Verspoor. 2017. SemEval-2017 Task 3: Community Question Answering.
[31]
Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de Buy Wenniger, and Peyman Passban. 2018. Investigating Backtranslation in Neural Machine Translation. arXiv preprint (2018). http://arxiv.org/abs/1804.06189
[32]
Amir Pouran Ben Veyseh. 2016. Cross-Lingual Question Answering Using Common Semantic Space. In Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, 15-19.
[33]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 232-241.
[34]
Salvatore Romeo, Giovanni Da San Martino, Alberto Barrón-Cedeno, Alessandro Moschitti, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Mitra Mohtarami, and James Glass. 2016. Neural attention for learning to rank questions in community question answering. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016). 1734-1745.
[35]
Andreas Rückle´, Steffen Eger, Maxime Peyrard, and Iryna Gurevych. 2018. Concatenated Power Mean Embeddings as Universal Cross-Lingual Sentence Representations. arXiv (2018). https://arxiv.org/abs/1803.01400
[36]
Shadi Saleh and Pavel Pecina. 2016. Reranking Hypotheses of Machine-Translated Queries for Cross-Lingual Information Retrieval. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer International Publishing, 54-66.
[37]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Association for Computational Linguistics, 86-96.
[38]
Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. Adversarial Domain Adaptation for Duplicate Question Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). Association for Computational Linguistics, 1056-1063. http://aclweb.org/anthology/D18-1131
[39]
Ian Soboroff, Kira Griffitt, and Stephanie Strassel. 2016. The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2016). ACM, 713-716.
[40]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS 2014). 3104-3112.
[41]
Ferhan Ture and Elizabeth Boschee. 2016. Learning to Translate for Multilingual Question Answering. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). Association for Computational Linguistics, 573-584.
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS 2017). 5998-6008.
[43]
John Wieting, Jonathan Mallinson, and Kevin Gimpel. 2017. Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). Association for Computational Linguistics, 274-285. http://aclweb.org/anthology/D17-1026
[44]
Xiaobing Xue, Jiwoon Jeon, and W Bruce Croft. 2008. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2008). ACM, 475-482.
[45]
Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. International Conference on Learning Representations (ICLR 2018) (2018).
[46]
Kai Zhang, Wei Wu, Haocheng Wu, Zhoujun Li, and Ming Zhou. 2014. Question Retrieval with High Quality Answers in Community Question Answering. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM 2014) (2014), 371-380.
[47]
Minghua Zhang and Yunfang Wu. 2018. An Unsupervised Model with Attention Autoencoders for Question Retrieval. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence (AAAI 2018). 4978-4986.
[48]
Wei Emma Zhang, Quan Z Sheng, Jey Han Lau, Ermyas Abebe, and Wenjie Ruan. 2018. Duplicate Detection in Programming Question Answering Communities. ACM Transactions on Internet Technology (TOIT)18, 3 (2018), 37.
[49]
Guangyou Zhou, Li Cai, Jun Zhao, and Kang Liu. 2011. Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2011). Association for Computational Linguistics, 653-662. http://aclweb.org/anthology/P11-1066

Cited By

View all
  • (2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
  • (2022)More Gamification Is Not Always BetterProceedings of the ACM on Human-Computer Interaction10.1145/35555536:CSCW2(1-32)Online publication date: 11-Nov-2022
  • (2022)HeteroQAProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498378(307-315)Online publication date: 11-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Community Question Answering
  2. Cross-lingual Retrieval
  3. Neural Machine Translation
  4. Question Retrieval
  5. Representation Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The power and potentials of Flexible Query Answering SystemsData & Knowledge Engineering10.1016/j.datak.2023.102246149:COnline publication date: 1-Jan-2024
  • (2022)More Gamification Is Not Always BetterProceedings of the ACM on Human-Computer Interaction10.1145/35555536:CSCW2(1-32)Online publication date: 11-Nov-2022
  • (2022)HeteroQAProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498378(307-315)Online publication date: 11-Feb-2022
  • (2022)On the analysis and evaluation of information retrieval models for social book searchMultimedia Tools and Applications10.1007/s11042-022-13417-782:5(6431-6478)Online publication date: 27-Jul-2022
  • (2021)Cross-lingual Language Model Pretraining for RetrievalProceedings of the Web Conference 202110.1145/3442381.3449830(1029-1039)Online publication date: 19-Apr-2021
  • (2021)Hierarchical deep multi-modal network for medical visual question answeringExpert Systems with Applications10.1016/j.eswa.2020.113993164(113993)Online publication date: Feb-2021
  • (2021)Towards End-to-End Multilingual Question AnsweringInformation Systems Frontiers10.1007/s10796-020-09996-123:1(227-241)Online publication date: 1-Feb-2021
  • (2020)Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answeringMachine Translation10.1007/s10590-020-09257-734:4(287-303)Online publication date: 1-Dec-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media