Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3488560.3498488acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases

Published: 15 February 2022 Publication History

Abstract

Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique for doing this is to apply named entity disambiguation (NED) on the question, and retrieve KB facts for the disambiguated entities. This work presents CLOCQ, an efficient method that prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses a top-k query processor over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks for complex questions demonstrate the superiority of CLOCQ over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.

Supplementary Material

MP4 File (WSDM22-fp621.mp4)
Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique for doing this is to apply named entity disambiguation (NED) systems to the question, and retrieve KB facts for the disambiguated entities. We identify several problems with this standard approach, and propose CLOCQ as an end-to-end method for search space reduction, going beyond NED. We show that CLOCQ enhances the answer presence in the reduced search space. Finally, we make CLOCQ available to the community, for both, search space reduction, and efficient KB-access. We hope that this can help to overcome obstacles (like 2 TB of data) for getting started with KB-QA.

References

[1]
Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum. 2018. Never-ending learning for open-domain question answering over knowledge bases. In WWW .
[2]
Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2010. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. NAACL (2010).
[3]
Vo Ngoc Anh and Alistair Moffat. 2006. Pruned query evaluation using pre-computed impacts. In SIGIR .
[4]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a Web of open data . (2007).
[5]
Junwei Bao, Nan Duan, Zhao Yan, Ming Zhou, and Tiejun Zhao. 2016. Constraint-based question answering with knowledge graph. In COLING .
[6]
Hannah Bast and Elmar Haussmann. 2015. More accurate question answering on freebase. In CIKM .
[7]
Hannah Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, and Gerhard Weikum. 2006. IO-Top-k: Index-access Optimized Top-k Query Processing. In VLDB Conference .
[8]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In EMNLP .
[9]
Nikita Bhutani, Xinyi Zheng, and HV Jagadish. 2019. Learning to answer complex questions over knowledge bases with query composition. In CIKM .
[10]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD .
[11]
Chris Buckley and Alan F Lewit. 1985. Optimization of inverted vector searches. In SIGIR .
[12]
Philipp Christmann, Rishiraj Saha Roy, Abdalghani Abujabal, Jyotsna Singh, and Gerhard Weikum. 2019. Look before you hop: Conversational question answering over knowledge graphs using judicious context expansion. In CIKM .
[13]
Jiwei Ding, Wei Hu, Qixin Xu, and Yuzhong Qu. 2019. Leveraging Frequent Query Substructures to Generate Formal Queries for Complex Question Answering. In EMNLP-IJCNLP .
[14]
Mohnish Dubey, Debayan Banerjee, Abdelrahman Abdelkawi, and Jens Lehmann. 2019. Lc-quad 2.0: A large dataset for complex question answering over wikidata and dbpedia. In ISWC .
[15]
Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, and Jens Lehmann. 2018. EARL: joint entity and relation linking for question answering over knowledge graphs. In ISWC .
[16]
Orri Erling and Ivan Mikhailov. 2010. Virtuoso: RDF support in a native RDBMS. In Semantic Web Information Management .
[17]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, Vol. 66, 4 (2003).
[18]
Javier D Fernández, Miguel A Mart'inez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. 2013. Binary RDF representation for publication and exchange (HDT). In Journal of Web Semantics .
[19]
Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities). In CIKM .
[20]
Mikhail Galkin, Priyansh Trivedi, Gaurav Maheshwari, Ricardo Usbeck, and Jens Lehmann. 2020. Message Passing for Hyper-Relational Knowledge Graphs. In EMNLP .
[21]
Clinton Gormley and Zachary Tong. 2015. Elasticsearch: the definitive guide: a distributed real-time search and analytics engine. O'Reilly Media, Inc.
[22]
Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. 2014. TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In SIGMOD .
[23]
Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In WSDM .
[24]
Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. Reifying RDF: What works well with Wikidata? SSWS@ ISWC (2015).
[25]
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fü rstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust Disambiguation of Named Entities in Text. In EMNLP .
[26]
Sen Hu, Lei Zou, and Xinbo Zhang. 2018. A state-transition framework to answer complex questions over knowledge base. In EMNLP .
[27]
Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In WSDM .
[28]
Ihab F Ilyas, George Beskales, and Mohamed A Soliman. [n.d.]. A survey of top-k query processing techniques in relational database systems. CSUR .
[29]
Yunshi Lan and Jing Jiang. 2020. Query graph generation for answering multi-hop complex questions from knowledge bases. In ACL .
[30]
Jyoti Leeka, Srikanta Bedathur, Debajyoti Bera, and Medha Atre. 2016. Quark-X: An efficient top-k processing framework for RDF quad stores. In CIKM .
[31]
Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, and Wen-tau Yih. 2020. Efficient One-Pass End-to-End Entity Linking for Questions. In EMNLP .
[32]
Xiaolu Lu, Soumajit Pramanik, Rishiraj Saha Roy, Abdalghani Abujabal, Yafang Wang, and Gerhard Weikum. 2019. Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs. In SIGIR .
[33]
Kangqi Luo, Fengli Lin, Xusheng Luo, and Kenny Zhu. 2018. Knowledge base question answering via encoding of complex query graphs. In EMNLP .
[34]
Joel Mackenzie and Alistair Moffat. 2020. Examining the Additivity of Top-k Query Processing Innovations. In CIKM .
[35]
Thomas Neumann and Gerhard Weikum. 2008. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment .
[36]
Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, and Scott Yih. 2020. Unified Open-Domain Question Answering with Structured and Unstructured Knowledge. arXiv .
[37]
Thomas Pellissier Tanon, Denny Vrandevc ić, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From Freebase to Wikidata: The great migration . In WWW .
[38]
Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning. 2020. Stanza: A python natural language processing toolkit for many human languages. arXiv .
[39]
Yunqi Qiu, Yuanzhuo Wang, Xiaolong Jin, and Kun Zhang. 2020 a. Stepwise reasoning for multi-relation question answering over knowledge graph with weak supervision. In WSDM .
[40]
Yunqi Qiu, Kun Zhang, Yuanzhuo Wang, Xiaolong Jin, Long Bai, Saiping Guan, and Xueqi Cheng. 2020 b. Hierarchical Query Graph Generation for Complex Question Answering over Knowledge Graph. In CIKM .
[41]
Ridho Reinanda, Edgar Meij, and Maarten de Rijke. 2020. Knowledge Graphs: An Information Retrieval Perspective . Found. Trends Inf. Retr. (2020).
[42]
Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval (2009).
[43]
Rishiraj Saha Roy and Avishek Anand. 2021. Question Answering for the Curated Web: Tasks and Methods in QA over Knowledge Bases and Text Collections . Synthesis Lectures on Information Concepts, Retrieval, and Services, Vol. 13, 4 (2021), 1--194.
[44]
Uma Sawant, Saurabh Garg, Soumen Chakrabarti, and Ganesh Ramakrishnan. 2019. Neural architecture for question answering using a knowledge graph and web corpus. In Information Retrieval Journal .
[45]
Tao Shen, Xiubo Geng, QIN Tao, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, and Daxin Jiang. 2019. Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base. In EMNLP-IJCNLP .
[46]
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering (2015).
[47]
Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO: A core of semantic knowledge. In WWW .
[48]
Haitian Sun, Tania Bedrax-Weiss, and William Cohen. 2019. PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text. In EMNLP-IJCNLP .
[49]
Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William Cohen. 2018. Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text. In EMNLP .
[50]
Jacopo Urbani, Sourav Dutta, Sairam Gurajada, and Gerhard Weikum. 2016. KOGNAC: Efficient encoding of large knowledge graphs. In IJCAI .
[51]
Jacopo Urbani and Ceriel Jacobs. 2020. Adaptive Low-level Storage of Very Large Knowledge Graphs. In WWW .
[52]
Svitlana Vakulenko, Javier David Fernandez Garcia, Axel Polleres, Maarten de Rijke, and Michael Cochez. 2019. Message passing for complex question answering over knowledge graphs. In CIKM .
[53]
Johannes M van Hulst, Faegheh Hasibi, Koen Dercksen, Krisztian Balog, and Arjen P de Vries. 2020. REL: An Entity Linker Standing on the Shoulders of Giants. In SIGIR .
[54]
Denny Vrandevc ić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase . CACM .
[55]
Cathrin Weiss, Panagiotis Karras, and Abraham Bernstein. 2008. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment .
[56]
Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016. Question Answering on Freebase via Relation Extraction and Textual Evidence. In ACL .
[57]
Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum. 2012. Natural language questions for the web of data. In EMNLP .
[58]
Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, and Yuji Matsumoto. 2020. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. In EMNLP .
[59]
Yi Yang and Ming-Wei Chang. 2015. S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking. In ACL-IJCNLP .
[60]
Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao. 2015. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base. In ACL-IJCNLP .

Cited By

View all
  • (2024)Uniqorn: Unified question answering over RDF knowledge graphs and natural language textJournal of Web Semantics10.1016/j.websem.2024.10083383(100833)Online publication date: Dec-2024
  • (2023)CricGPT: A GPT-aided Question-Answering system for CricketProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632757(44-50)Online publication date: 15-Dec-2023
  • (2023)Techniques, datasets, evaluation metrics and future directions of a question answering systemKnowledge and Information Systems10.1007/s10115-023-02019-w66:4(2235-2268)Online publication date: 22-Dec-2023
  • Show More Cited By

Index Terms

  1. Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
    February 2022
    1690 pages
    ISBN:9781450391320
    DOI:10.1145/3488560
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2022

    Check for updates

    Author Tags

    1. entity linking
    2. knowledge bases
    3. question answering

    Qualifiers

    • Research-article

    Funding Sources

    • ERC Synergy Grant

    Conference

    WSDM '22

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)189
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Uniqorn: Unified question answering over RDF knowledge graphs and natural language textJournal of Web Semantics10.1016/j.websem.2024.10083383(100833)Online publication date: Dec-2024
    • (2023)CricGPT: A GPT-aided Question-Answering system for CricketProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632757(44-50)Online publication date: 15-Dec-2023
    • (2023)Techniques, datasets, evaluation metrics and future directions of a question answering systemKnowledge and Information Systems10.1007/s10115-023-02019-w66:4(2235-2268)Online publication date: 22-Dec-2023
    • (2023)Semantic Parsing for Knowledge Graph Question Answering with Large Language ModelsThe Semantic Web: ESWC 2023 Satellite Events10.1007/978-3-031-43458-7_42(234-243)Online publication date: 28-May-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media