Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2509558.2509569acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Published: 27 October 2013 Publication History

Abstract

Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.

References

[1]
Cynthia Matuszek Michael, Michael Witbrock, Robert C. Kahlert, John Cabral, Dave Schneider, Purvesh Shah, and Doug Lenat. Searching for common sense: Populating cyc from the web. In In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 1430--1435, 2005.
[2]
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005.
[3]
Kenneth D Forbus, Christopher Riesbeck, Lawrence Birnbaum, Kevin Livingston, Abhishek Sharma, and Leo Ureel. Integrating natural language, knowledge representation and reasoning, and analogical processing to learn by reading. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, volume 22, page 1542. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2007.
[4]
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Procs. of IJCAI, 2007.
[5]
Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krüpl, and Bernhard Pollak. Towards domain-independent information extraction from web tables. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 71--80, New York, NY, USA, 2007. ACM.
[6]
Michael J. Cafarella, Alon Y. Halevy, Daisy Z. Wang, Eugene W. 0002, and Yang Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008.
[7]
Fei Wu and Daniel S. Weld. Automatically refining the wikipedia infobox ontology. In Proc. of WWW, 2008.
[8]
F.M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In Procs. of WWW, 2007.
[9]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.
[10]
Sören Auer and Jens Lehmann. What have innsbruck and leipzig in common? extracting semantics from wiki content. In Proc. of ESWC, 2007.
[11]
James Fan, David Ferrucci, David Gondek, and Aditya Kalyanpur. Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 122--127. Association for Computational Linguistics, 2010.
[12]
David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. Building watson: An overview of the deepqa project. AI magazine, 31(3):59--79, 2010.
[13]
M. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. In Procs. of the 14th International Conference on Computational Linguistics, pages 539--545, Nantes, France, 1992.
[14]
Doug Downey, Oren Etzioni, and Stephen Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. Artificial Intelligence, 174(11):726 -- 748, 2010.
[15]
Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. Organizing and searching the world wide web of facts - step one: The one-million fact extraction challenge. In AAAI 2006. AAAI Press, 2006.
[16]
Fei Wu, Raphael Hoffmann, and Daniel S. Weld. Information extraction from wikipedia: moving down the long tail. In Proc. of KDD, 2008.
[17]
Fei Wu and Daniel S. Weld. Autonomously semantifying wikipedia. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 41--50, New York, NY, USA, 2007. ACM.
[18]
Hector Gonzalez, Alon Y Halevy, Christian S Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, and Jonathan Goldberg-Kidon. Google fusion tables: web-centered data management and collaboration. In Proceedings of the 2010 international conference on Management of data, pages 1061--1066. ACM, 2010.
[19]
Push Singh, Thomas Lin, Erik T Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pages 1223--1237. Springer, 2002.
[20]
L.K. Schubert and M.H. Tong. Extracting and evaluating general world knowledge from the brown corpus. In Proc. of the HLT/NAACL Workshop on Text Meaning, 2003.
[21]
AnHai Doan and Alon Y. Halevy. Semantic-integration research in the database community. AI Mag., 26(1):83--94, 2005.
[22]
Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee. Linked data on the web (ldow2008). In Proceedings of the 17th international conference on World Wide Web, pages 1265--1266. ACM, 2008.
[23]
O. Medelyan and C. Legg. Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense. In Proc. of WIKIAI, 2008.
[24]
D. Downey, A. Ahuja, and M. Anderson. Learning to integrate relational databases with wikipedia. In Proc. of WIKIAI, 2009.
[25]
Thomas Lin, Oren Etzioni, et al. Entity linking at web scale. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 84--88. Association for Computational Linguistics, 2012.
[26]
Z. Harris. Distributional structure. In J. J. Katz, editor, The Philosophy of Linguistics, pages 26--47. New York: Oxford University Press, 1985.
[27]
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Interactive Data Exploration and Analytics (IDEA). ACM, 2013.
[28]
Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Alan Ritter, Stefan Schoenmackers, et al. Machine reading at the university of washington. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 87--95. Association for Computational Linguistics, 2010.
[29]
Jonathan Gordon and Benjamin Van Durme. Reporting bias and knowledge acquisition. In Automated Knowledge Base Construction (AKBC): The 3rd Workshop on Knowledge Extraction at CIKM, 2013.
[30]
Fei Huang, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates. Learning Representations for Weakly Supervised Natural Language Processing Tasks. Computational Linguistics, xx:yy, 2013.
[31]
Noah A Smith. Adversarial evaluation for models of natural language. arXiv preprint arXiv:1207.0245, 2012.
[32]
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.
[33]
Jeff Mitchell and Mirella Lapata. Composition in distributional models of semantics. Cognitive Science, 34(8):1388--1429, 2010.
[34]
Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1201--1211. Association for Computational Linguistics, 2012.
[35]
Jason Wolfe, Aria Haghighi, and Dan Klein. Fully distributed em for very large datasets. In ICML, 2008.
[36]
Yi Yang, Alexander Yates, and Doug Downey. Overcoming the memory bottleneck in distributed training of latent variable models of text. In Proceedings of NAACL-HLT, pages 579--584, 2013.
[37]
Burr Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.
[38]
Michael Lucas and Doug Downey. Scaling semi-supervised naive bayes with feature marginals. In Proceedings of ACL, 2013.

Cited By

View all
  • (2017)Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling2017 IEEE International Conference on Information Reuse and Integration (IRI)10.1109/IRI.2017.18(32-41)Online publication date: Aug-2017
  • (2016)Logico-linguistic Semantic Representation of Documents2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC-PICom-DataCom-CyberSciTec.2016.135(773-780)Online publication date: Aug-2016

Index Terms

  1. Using natural language to integrate, evaluate, and optimize extracted knowledge bases

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction
        October 2013
        124 pages
        ISBN:9781450324113
        DOI:10.1145/2509558
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 October 2013

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. knowledge extraction
        2. knowledge integration
        3. language modeling

        Qualifiers

        • Poster

        Conference

        CIKM'13
        Sponsor:

        Acceptance Rates

        AKBC '13 Paper Acceptance Rate 9 of 19 submissions, 47%;
        Overall Acceptance Rate 9 of 19 submissions, 47%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 03 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2017)Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling2017 IEEE International Conference on Information Reuse and Integration (IRI)10.1109/IRI.2017.18(32-41)Online publication date: Aug-2017
        • (2016)Logico-linguistic Semantic Representation of Documents2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC-PICom-DataCom-CyberSciTec.2016.135(773-780)Online publication date: Aug-2016

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media