Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459637.3482163acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Open access

Question Answering using Web Lists

Published: 30 October 2021 Publication History

Abstract

There are many natural questions that are best answered with a list. We address the problem of answering such questions using lists that occur on the Web, i.e. List Question Answering (ListQA). The diverse formats of lists on the Web makes this task challenging. We describe state-of-the-art methods for list extraction and ranking, that also consider the text surrounding the lists as context. Due to the lack of realistic public datasets for ListQA, we present three novel datasets that together are realistic, reproducible and test out-of-domain generalization. We benchmark the above steps on these datasets, with and without context. On the hardest setting (realistic and out-of-domain), we achieve an end-to-end Precision@1 of 51.28% and HITs@5 of 79.38%, effectively demonstrating the difficulty of the task and quantifying the immediate opportunity for improvement. We highlight some future directions through error analysis and release the datasets for further research.

Supplementary Material

MP4 File (Recording #7.mp4)
There are many natural questions that are best answered with a list. We address the problem of answering such questions using lists that occur on the Web, i.e. List Question Answering (ListQA). The diverse formats of lists on the Web makes this task challenging. We describe state-of-the-art methods for list extraction and ranking, that also consider the text surrounding the lists as context. Due to the lack of realistic public datasets for ListQA, we present three novel datasets that together are realistic, reproducible and test out-of-domain generalization. We benchmark the above steps on these datasets, with and without context. On the hardest setting (realistic and out-of-domain), we achieve an end-to-end Precision@1 of 51.28\% and HITs@5 of 79.38\%, effectively demonstrating the difficulty of the task and quantifying the immediate opportunity for improvement. We highlight some future directions through error analysis and release the datasets for further research.

References

[1]
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. Tabel: Entity linking in web tables. In International Semantic Web Conference. Springer, 425--441.
[2]
Michael J Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. Webtables: exploring the power of tables on the web. Proceedings of the VLDB Endowment, Vol. 1, 1 (2008), 538--549.
[3]
Zhiyu Chen, Mohamed Trabelsi, Jeff Heflin, Yinan Xu, and Brian D Davison. 2020. Table search using a deep contextualized language model. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 589--598.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423
[5]
Siddhant Garg, Thuy Vu, and Alessandro Moschitti. 2020. TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 05 (Apr 2020), 7780--7788. https://doi.org/10.1609/aaai.v34i05.6282
[6]
Jonathan Herzig, Thomas Müller, Syrine Krichene, and Julian Martin Eisenschlos. 2021. Open Domain Question Answering over Tables via Dense Retrieval. arXiv preprint arXiv:2103.12011 (2021).
[7]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6769--6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
[8]
Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, and Chris Callison-Burch. 2021. GooAQ: Open Question Answering with Diverse Answer Types. arXiv preprint (2021).
[9]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 453--466.
[10]
Oliver Lehmberg, Dominique Ritze, Robert Meusel, and Christian Bizer. 2016. A large public corpus of web tables containing time and context metadata. In Proceedings of the 25th International Conference Companion on World Wide Web. 75--76.
[11]
Gengxin Miao, Junichi Tatemura, Wang-Pin Hsiung, Arsany Sawires, and Louise E Moser. 2009. Extracting data records from the web using tag path clustering. In Proceedings of the 18th international conference on World wide web. 981--990.
[12]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPS.
[13]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019).
[14]
Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, and Scott Yih. 2020. Unified Open-Domain Question Answering with Structured and Unstructured Knowledge. arXiv preprint arXiv:2012.14610 (2020).
[15]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
[16]
Mengqiu Wang, Noah A Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 22--32.
[17]
Hui Yang and Tat-Seng Chua. 2004. Web-based list question answering. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics. 1277--1283.
[18]
Liu Yang, Qingyao Ai, Jiafeng Guo, and W Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the 25th ACM international on conference on information and knowledge management. 287--296.
[19]
Xingyao Zhang, Linjun Shou, Jian Pei, Ming Gong, Lijie Wen, and Daxin Jiang. 2020. A Graph Representation of Semi-structured Data for Web Question Answering. arXiv preprint arXiv:2010.06801 (2020).
[20]
Zhixian Zhang, Kenny Q Zhu, Haixun Wang, and Hongsong Li. 2013. Automatic extraction of top-k lists from the web. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 1057--1068.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. list extraction
  2. list ranking
  3. question answering
  4. semi-structured

Qualifiers

  • Short-paper

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 385
    Total Downloads
  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)19
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media