poster

Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Authors:

Chandra Sekhar Bhagavatula,

Alexander YatesAuthors Info & Claims

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

Pages 61 - 66

https://doi.org/10.1145/2509558.2509569

Published: 27 October 2013 Publication History

Abstract

Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.

References

[1]

Cynthia Matuszek Michael, Michael Witbrock, Robert C. Kahlert, John Cabral, Dave Schneider, Purvesh Shah, and Doug Lenat. Searching for common sense: Populating cyc from the web. In In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 1430--1435, 2005.

Digital Library

[2]

O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005.

Digital Library

[3]

Kenneth D Forbus, Christopher Riesbeck, Lawrence Birnbaum, Kevin Livingston, Abhishek Sharma, and Leo Ureel. Integrating natural language, knowledge representation and reasoning, and analogical processing to learn by reading. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, volume 22, page 1542. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2007.

Digital Library

[4]

M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Procs. of IJCAI, 2007.

Digital Library

[5]

Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krüpl, and Bernhard Pollak. Towards domain-independent information extraction from web tables. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 71--80, New York, NY, USA, 2007. ACM.

Digital Library

[6]

Michael J. Cafarella, Alon Y. Halevy, Daisy Z. Wang, Eugene W. 0002, and Yang Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008.

Digital Library

[7]

Fei Wu and Daniel S. Weld. Automatically refining the wikipedia infobox ontology. In Proc. of WWW, 2008.

Digital Library

[8]

F.M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In Procs. of WWW, 2007.

Digital Library

[9]

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.

Digital Library

[10]

Sören Auer and Jens Lehmann. What have innsbruck and leipzig in common? extracting semantics from wiki content. In Proc. of ESWC, 2007.

Digital Library

[11]

James Fan, David Ferrucci, David Gondek, and Aditya Kalyanpur. Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 122--127. Association for Computational Linguistics, 2010.

Digital Library

[12]

David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. Building watson: An overview of the deepqa project. AI magazine, 31(3):59--79, 2010.

Digital Library

[13]

M. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. In Procs. of the 14th International Conference on Computational Linguistics, pages 539--545, Nantes, France, 1992.

Digital Library

[14]

Doug Downey, Oren Etzioni, and Stephen Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. Artificial Intelligence, 174(11):726 -- 748, 2010.

Digital Library

[15]

Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. Organizing and searching the world wide web of facts - step one: The one-million fact extraction challenge. In AAAI 2006. AAAI Press, 2006.

Digital Library

[16]

Fei Wu, Raphael Hoffmann, and Daniel S. Weld. Information extraction from wikipedia: moving down the long tail. In Proc. of KDD, 2008.

Digital Library

[17]

Fei Wu and Daniel S. Weld. Autonomously semantifying wikipedia. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 41--50, New York, NY, USA, 2007. ACM.

Digital Library

[18]

Hector Gonzalez, Alon Y Halevy, Christian S Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, and Jonathan Goldberg-Kidon. Google fusion tables: web-centered data management and collaboration. In Proceedings of the 2010 international conference on Management of data, pages 1061--1066. ACM, 2010.

Digital Library

[19]

Push Singh, Thomas Lin, Erik T Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pages 1223--1237. Springer, 2002.

Digital Library

[20]

L.K. Schubert and M.H. Tong. Extracting and evaluating general world knowledge from the brown corpus. In Proc. of the HLT/NAACL Workshop on Text Meaning, 2003.

Digital Library

[21]

AnHai Doan and Alon Y. Halevy. Semantic-integration research in the database community. AI Mag., 26(1):83--94, 2005.

Digital Library

[22]

Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee. Linked data on the web (ldow2008). In Proceedings of the 17th international conference on World Wide Web, pages 1265--1266. ACM, 2008.

Digital Library

[23]

O. Medelyan and C. Legg. Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense. In Proc. of WIKIAI, 2008.

[24]

D. Downey, A. Ahuja, and M. Anderson. Learning to integrate relational databases with wikipedia. In Proc. of WIKIAI, 2009.

[25]

Thomas Lin, Oren Etzioni, et al. Entity linking at web scale. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 84--88. Association for Computational Linguistics, 2012.

Digital Library

[26]

Z. Harris. Distributional structure. In J. J. Katz, editor, The Philosophy of Linguistics, pages 26--47. New York: Oxford University Press, 1985.

[27]

Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Interactive Data Exploration and Analytics (IDEA). ACM, 2013.

Digital Library

[28]

Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Alan Ritter, Stefan Schoenmackers, et al. Machine reading at the university of washington. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 87--95. Association for Computational Linguistics, 2010.

Digital Library

[29]

Jonathan Gordon and Benjamin Van Durme. Reporting bias and knowledge acquisition. In Automated Knowledge Base Construction (AKBC): The 3rd Workshop on Knowledge Extraction at CIKM, 2013.

Digital Library

[30]

Fei Huang, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates. Learning Representations for Weakly Supervised Natural Language Processing Tasks. Computational Linguistics, xx:yy, 2013.

[31]

Noah A Smith. Adversarial evaluation for models of natural language. arXiv preprint arXiv:1207.0245, 2012.

[32]

Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.

Digital Library

[33]

Jeff Mitchell and Mirella Lapata. Composition in distributional models of semantics. Cognitive Science, 34(8):1388--1429, 2010.

[34]

Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1201--1211. Association for Computational Linguistics, 2012.

Digital Library

[35]

Jason Wolfe, Aria Haghighi, and Dan Klein. Fully distributed em for very large datasets. In ICML, 2008.

Digital Library

[36]

Yi Yang, Alexander Yates, and Doug Downey. Overcoming the memory bottleneck in distributed training of latent variable models of text. In Proceedings of NAACL-HLT, pages 579--584, 2013.

[37]

Burr Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.

Digital Library

[38]

Michael Lucas and Doug Downey. Scaling semi-supervised naive bayes with feature marginals. In Proceedings of ACL, 2013.

Cited By

Zhu XKlabjan DBless P(2017)Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling2017 IEEE International Conference on Information Reuse and Integration (IRI)10.1109/IRI.2017.18(32-41)Online publication date: Aug-2017
https://doi.org/10.1109/IRI.2017.18
Wani SWahiddin MSembok T(2016)Logico-linguistic Semantic Representation of Documents2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC-PICom-DataCom-CyberSciTec.2016.135(773-780)Online publication date: Aug-2016
https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2016.135

Index Terms

Using natural language to integrate, evaluate, and optimize extracted knowledge bases
1. Information systems
  1. Information retrieval
  2. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts
Abstract
This article overviews means of description logics for representing knowledge contained in natural language texts and a classification of description logics by constructors of concepts and roles. It also considers basic conceptions of temporal ...
How to make knowledge resources valuable

PurposeThis paper aims to offer an integration point for newly acquired heterogeneous knowledge resources to be assessed if these resources qualify to be a part of a firm's existing knowledge resource portfolio. Focus of this paper will be on the ...
Deep knowledge integration of heterogeneous features for domain adaptive SAR target recognition
Highlights
- Deep knowledge integration at the feature and the decision levels based on heterogeneous features.
Abstract
How to integrate various heterogeneous features for better recognition performance is increasingly critical for automatic target recognition. Existing integration methods present the following drawbacks: (1) most feature integration ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

October 2013

124 pages

ISBN:9781450324113

DOI:10.1145/2509558

Program Chairs:
Fabian M. Suchanek
Max Planck Institute for Informatics, Germany
,
Sebastian Riedel
University College London, UK
,
Sameer Singh
University of Massachusetts Amherst, USA
,
Partha Pratim Talukdar
Carnegie Mellon University, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM'13

Sponsor:

CIKM'13: 22nd ACM International Conference on Information and Knowledge Management

October 27 - 28, 2013

California, San Francisco, USA

Acceptance Rates

AKBC '13 Paper Acceptance Rate 9 of 19 submissions, 47%;

Overall Acceptance Rate 9 of 19 submissions, 47%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
129
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu XKlabjan DBless P(2017)Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling2017 IEEE International Conference on Information Reuse and Integration (IRI)10.1109/IRI.2017.18(32-41)Online publication date: Aug-2017
https://doi.org/10.1109/IRI.2017.18
Wani SWahiddin MSembok T(2016)Logico-linguistic Semantic Representation of Documents2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC-PICom-DataCom-CyberSciTec.2016.135(773-780)Online publication date: Aug-2016
https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2016.135

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents