Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2213836.2213891acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Probase: a probabilistic taxonomy for text understanding

Published: 20 May 2012 Publication History

Abstract

Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in human language. Much work has been devoted to creating universal ontologies or taxonomies for this purpose. However, none of the existing ontologies has the needed depth and breadth for universal understanding. In this paper, we present a universal, probabilistic taxonomy that is more comprehensive than any existing ones. It contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages. Unlike traditional taxonomies that treat knowledge as black and white, it uses probabilities to model inconsistent, ambiguous and uncertain information it contains. We present details of how the taxonomy is constructed, its probabilistic modeling, and its potential applications in text understanding.

References

[1]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007.
[2]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, pages 2670--2676, 2007.
[3]
D. Blei and J. Lafferty. Topic models. In Text Mining: Classification, Clustering, and Applications. Taylor & Francis, 2009.
[4]
P. Bloom. Glue for the mental world. Nature, 421:212--213, Jan 2003.
[5]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
[6]
S. A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In ACL, 1999.
[7]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.
[8]
P. Cimiano, A. Pivk, L. S. Thieme, and S. Staab. Learning taxonomic relations from heterogeneous sources of evidence. In Proceedings of the ECAI 2004 Ontology Learning and Population Workshop, 2004.
[9]
B. Ding, H. Wang, R. Jin, J. Han, and Z. Wang. Optimizing index for taxonomy keyword search. In SIGMOD, 2012.
[10]
D. Downey, M. Broadhead, and O. Etzioni. Locating complex named entities in web text. In IJCAI, pages 2733--2739, 2007.
[11]
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In IJCAI, 2005.
[12]
O. Etzioni, M. J. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall. In WWW, pages 100--110, 2004.
[13]
C. Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998.
[14]
M. Fleischman. Automated subcategorization of named entities. In ACL (Companion Volume), pages 25--30, 2001.
[15]
M. Fleischman and E. H. Hovy. Fine grained classification of named entities. In COLING, 2002.
[16]
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539--545, 1992.
[17]
T. Lee, Z. Wang, H. Wang, and S. Hwang. Web scale taxonomy cleansing. In VLDB, 2011.
[18]
D. B. Lenat and R. V. Guha. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989.
[19]
P. Li, H. Wang, H. Li, and X. Wu. Sparse information extraction based on semantic contexts. Under submission, 2012.
[20]
Z. Li, H. Li, H. Wang, and X. Zhou. Overcoming semantic drift in web-scale information extraction. Under submission, 2012.
[21]
C. Matuszek, M. J. Witbrock, R. C. Kahlert, J. Cabral, D. Schneider, P. Shah, and D. B. Lenat. Searching for common sense: Populating cyc from the web. In AAAI, pages 1430--1435, 2005.
[22]
G. Murphy. The big book of concepts. The MIT Press, 2004.
[23]
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3--26, 2007.
[24]
R. Navigli. Word sense disambiguation: A survey. ACM Comput. Surv., 41(2), 2009.
[25]
M. Pasca. Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds. In WWW, 2007.
[26]
S. P. Ponzetto and M. Strube. Deriving a large-scale taxonomy from wikipedia. In AAAI, 2007.
[27]
A. Ritter, S. Soderland, and O. Etzioni. What is this, anyway: Automatic hypernym discovery. In AAAI Spring Symposium on Learning by Reading and Learning to Read, 2009.
[28]
E. Segal, D. Koller, and D. Ormoneit. Probabilistic abstraction hierarchies. In NIPS, pages 913--920, 2001.
[29]
B. Shao, H. Wang, and Y. Li. The Trinity graph engine. Technical report, Microsoft Research, 2012.
[30]
B. Shao, H. Wang, and Y. Xiao. Managing and mining large graphs: Systems and implementations. In SIGMOD, 2012.
[31]
P. Singh, T. Lin, E. Mueller, G. Lim, T. Perkins, and W. Li Zhu. Open Mind Common Sense: Knowledge acquisition from the general public. On the Move to Meaningful Internet Systems: CoopIS, DOA, and ODBASE, pages 1223--1237, 2002.
[32]
R. Snow, D. Jurafsky, and A. Y. Ng. Semantic taxonomy induction from heterogenous evidence. In ACL, 2006.
[33]
R. Snow, S. Prakash, D. Jurafsky, and A. Y. Ng. Learning to merge word senses. In EMNLP-CoNLL, pages 1005--1014, 2007.
[34]
Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen. Short text conceptualization using a probabilistic knowledgebase. In IJCAI, 2011.
[35]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697--706, 2007.
[36]
C. Thomas, P. Mehra, R. Brooks, and A. P. Sheth. Growing fields of interest - using an expand and reduce strategy for domain model extraction. In Web Intelligence, pages 496--502, 2008.
[37]
J. Wang, Z. Wang, H. Wang, and K. Q. Zhu. Understanding tables on the web. Technical report, Microsoft Research, 2010.
[38]
S. Wang, Y. Song, H. Wang, and Z. Zhang. On understanding short texts. Under submission, 2012.
[39]
Y. Wang, H. Li, H. Wang, and K. Q. Zhu. Toward topic search on the web. Technical report, Microsoft Research, 2010.
[40]
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. Technical report, Microsoft Research, 2012.

Cited By

View all
  • (2025)A survey on pragmatic processing techniquesInformation Fusion10.1016/j.inffus.2024.102712114(102712)Online publication date: Feb-2025
  • (2024)XLORE 3: A Large-Scale Multilingual Knowledge Graph from Heterogeneous Wiki Knowledge ResourcesACM Transactions on Information Systems10.1145/366052142:6(1-47)Online publication date: 19-Aug-2024
  • (2024)A Semantics-enhanced Topic Modelling Technique: Semantic-LDAACM Transactions on Knowledge Discovery from Data10.1145/363940918:4(1-27)Online publication date: 12-Feb-2024
  • Show More Cited By

Index Terms

  1. Probase: a probabilistic taxonomy for text understanding

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
      May 2012
      886 pages
      ISBN:9781450312479
      DOI:10.1145/2213836
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 May 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. knowledgebase
      2. taxonomy
      3. text understanding

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '12
      Sponsor:

      Acceptance Rates

      SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)133
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A survey on pragmatic processing techniquesInformation Fusion10.1016/j.inffus.2024.102712114(102712)Online publication date: Feb-2025
      • (2024)XLORE 3: A Large-Scale Multilingual Knowledge Graph from Heterogeneous Wiki Knowledge ResourcesACM Transactions on Information Systems10.1145/366052142:6(1-47)Online publication date: 19-Aug-2024
      • (2024)A Semantics-enhanced Topic Modelling Technique: Semantic-LDAACM Transactions on Knowledge Discovery from Data10.1145/363940918:4(1-27)Online publication date: 12-Feb-2024
      • (2024)Ontology Enrichment for Effective Fine-grained Entity TypingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671857(2318-2327)Online publication date: 25-Aug-2024
      • (2024)OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity TypingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671745(1407-1417)Online publication date: 25-Aug-2024
      • (2024)unKR: A Python Library for Uncertain Knowledge Graph Reasoning by Representation LearningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657661(2822-2826)Online publication date: 10-Jul-2024
      • (2024)Taxonomy Completion via Implicit Concept InsertionProceedings of the ACM Web Conference 202410.1145/3589334.3645584(2159-2169)Online publication date: 13-May-2024
      • (2024)Mining Exploratory Queries for Conversational SearchProceedings of the ACM Web Conference 202410.1145/3589334.3645424(1386-1394)Online publication date: 13-May-2024
      • (2024)Integrating Relational Knowledge With Text Sequences for Script Event PredictionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.323337135:7(9443-9454)Online publication date: Jul-2024
      • (2024)Scene-Driven Multimodal Knowledge Graph Construction for Embodied AIIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339974636:11(6962-6976)Online publication date: Nov-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media