Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178876.3186000acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

HighLife: Higher-arity Fact Harvesting

Published: 10 April 2018 Publication History

Abstract

Text-based knowledge extraction methods for populating knowledge bases have focused on binary facts: relationships between two entities. However, in advanced domains such as health, it is often crucial to consider ternary and higher-arity relations. An example is to capture which drug is used for which disease at which dosage (e.g. 2.5 mg/day) for which kinds of patients (e.g., children vs. adults). In this work, we present an approach to harvest higher-arity facts from textual sources. Our method is distantly supervised by seed facts, and uses the fact-pattern duality principle to gather fact candidates with high recall. For high precision, we devise a constraint-based reasoning method to eliminate false candidates. A major novelty is in coping with the difficulty that higher-arity facts are often expressed only partially in texts and strewn across multiple sources. For example, one sentence may refer to a drug, a disease and a group of patients, whereas another sentence talks about the drug, its dosage and the target group without mentioning the disease. Our methods cope well with such partially observed facts, at both pattern-learning and constraint-reasoning stages. Experiments with health-related documents and with news articles demonstrate the viability of our method.

References

[1]
C.F. Baker. FrameNet, current collaborations and future goals. Language Resources and Evaluation 46(2): 269--286, 2012.
[2]
H. Bast, B. Buchhold. An index for efficient semantic full-text search. In ACM Conference on Information and Knowledge Management (CIKM), pages 369--378, 2013.
[3]
S. L. Berrahou, P. Buche, J. Dibie, M. Roche. Xart system: Discovering and extracting correlated arguments of n-ary relations from text. In International Conference on Web Intelligence, Mining and Semantics (WIMS), pages 8:1--8:12, 2016.%
[4]
% J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and T. Salakoski,% Extracting Contextualized Complex Biological Events with Rich Graph-% Based Feature Sets.% Computational Intelligence, 27:541--557, 2011.
[5]
R. Brachman, H. Levesque. Knowledge Representation and Reasoning. The Morgan Kaufmann Series in Artificial Intelligence Series. Morgan Kaufmann, 2004.
[6]
S. Brin. Extracting patterns and relations from the World Wide Web. In International Workshop on The WorldWide Web and Databases (WebDB), pages 172--183, 1998.
[7]
R. C. Bunescu, R. J. Mooney. A shortest path dependency kernel for relation extraction. In Conference on Human Language Technology and Empirical Methods in Natural Language Processing (EMNLP-HLT), pages 724--731, 2005.
[8]
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka, Jr., T. M. Mitchell. Coupled semi-supervised learning for information extraction. In International Conference on Web Search and Data Mining (WSDM), pages 101--110, 2010.
[9]
Y. Chi, Y. Yang, R. R. Muntz. Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and Information Systems, 8(2):203--234, 2005.
[10]
J. Clarke, V. Srikumar, M. Sammons, D. Roth. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In International Conference on Language Resources and Evaluation (LREC:) pages 3276--3283, 2012.
[11]
L. Del Corro, R. Gemulla. Clausie: Clause-based open information extraction. In International Conference on World Wide Web (WWW), pages 355--366, 2013.
[12]
G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel, R. M. Weischedel. The Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation In Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation (LREC), 2004.
[13]
M. Dylla, I. Miliaraki, M. Theobald. A temporal-probabilistic database model for information extraction. VLDB (Very Large Data Bases) Endowment, 6(14):1810--1821, 2013.
[14]
P. Ernst, A. Siu, D. Milchevski, J. Hoffart, G. Weikum. DeepLife: An entity-aware search, analytics and exploration platform for health and life sciences. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 1017--1024, 2016.
[15]
P. Ernst, A. Siu, G. Weikum. KnowLife: A versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinformatics, 16(1):1--13, 2015.
[16]
G. Garrido, A. Pe nas, B. Cabaleiro, A. Rodrigo. Temporally anchored relation extraction. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 107--116, 2012.
[17]
D. Gildea, D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245--288, 2002.
[18]
J. Hoffart, D. Milchevski, G. Weikum. Stics: Searching with strings, things, and cats. In International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR), pages 1247--1248, 2014.
[19]
J. Hoffart, F. M. Suchanek, K. Berberich, G. Weikum. YAGO2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28 -- 61, 2013.
[20]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum. Robust disambiguation of named entities in text. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 782--792, 2011.
[21]
S. Krause, L. Hennig, A. Moro, D. Weissenborn, F. Xu, H. Uszkoreit, R. Navigli. Sar-graphs: A language resource connecting linguistic knowledge with semantic relations from knowledge graphs. Web Semantics: Science, Services and Agents on the World Wide Web, 37, 38:112 -- 131, 2016.
[22]
S. Krause, H. Li, H. Uszkoreit, F. Xu. Large-scale learning of relation-extraction rules with distant supervision from the web. In International Semantic Web Conference (ISWC), pages 263--278, 2012.
[23]
E. Kuzey, J. Vreeken, G. Weikum. A fresh look on knowledge bases: Distilling named events from news. In ACM International Conference on Information and Knowledge Management (CIKM), pages 1689--1698, 2014.
[24]
E. Kuzey, G. Weikum. Extraction of temporal facts and events from wikipedia. In Temporal Web Analytics Workshop (TempWeb), pages 25--32, 2012.
[25]
H. Li, S. Krause, F. Xu, A. Moro, H. Uszkoreit, R. Navigli. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning In International Conference on Agents and Artificial Intelligence (ICAART), Volume 2, 2015.
[26]
H. Liu, L. Hunter, V. Keselj, K. Verspoor. Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS ONE, 8(4):1--16, 2013.
[27]
V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, T. M. Tran. Consistent selectivity estimation via maximum entropy. The VLDB (Very Large Data Bases) Journal, 16(1):55--76, 2007.
[28]
Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni. Open language learning for information extraction. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 523--534, 2012.
[29]
D. McClosky, S. Riedel, M. Surdeanu, A. McCallum, C. D. Manning. Combining joint models for biomedical event extraction. BMC Bioinformatics, 13(11):1--12, 2012.
[30]
R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin, P. White. Simple algorithms for complex relation extraction with applications to biomedical ie. In Annual Meeting on Association for Computational Linguistics (ACL), pages 491--498, 2005.
[31]
F. Mesquita, J. Schmidek, D. Barbosa. Effectiveness and efficiency of open relation extraction. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 447--457, 2013.
[32]
M. Mintz, S. Bills, R. Snow, D. Jurafsky. Distant supervision for relation extraction without labeled data. In Annual Meeting on Association for Computational Linguistics (ACL), pages 1003--1011, 2009.
[33]
M. Miwa, P. Thompson, J. McNaught, D. B. Kell, S. Ananiadou. Extracting semantically enriched events from biomedical literature. BMC Bioinformatics, 13(1):1--24, 2012.
[34]
A. Moschitti. Making tree kernels practical for natural language learning. In European Chapter of the Association for Computational Linguistics (EACL), pages 113--120, 2006.
[35]
N. Nakashole, M. Theobald, G. Weikum. Scalable knowledge harvesting with high precision and high recall. In International Conference on Web Search and Data Mining (WSDM), pages 227--236, 2011.
[36]
M. Palmer, D. Gildea, P. Kingsbury The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics 28, 31(1):71--106, 2005.
[37]
M. Palmer, D. Gildea, N. Xue Semantic role labeling Synthesis Lectures on Human Language Technologies, 3(1):1--103, 2011.
[38]
N. Peng, H. Poon, C. Quirk, K. Toutanova, W. Yih. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. Transactions of the ACL (TACL) 5:101--115, 2017.
[39]
V. Punyakanok, D. Roth, W. Yih. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics 28, 34(2):257--287, 2008.
[40]
L. Ratinov, D. Roth. Design Challenges and Misconceptions in Named Entity Recognition. In Conference on Computational Natural Language Learning (CoNLL), pages 147--155, 2009.%
[41]
%J. Ruppenhofer, M. Ellsworth, M. Petruck, C. Johnson, C. Baker, J. Scheffczyk. FrameNet II: Extended Theory and Practice. framenet2.icsi.berkeley.edu/docs/r1.7/book.pdf, 2016.
[42]
D. Shahaf, C. Guestrin. Connecting two (or less) dots: Discovering structure in news articles. ACM Transactions on Knowledge Discovery from Data, 5(4):24:1--24:31, 2012.%
[43]
%A. Siu, D. B. Nguyen, and G. Weikum. Fast entity recognition in biomedical text.% In Workshop on Data Mining for Healthcare (DMH) at the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2013, Chicago, USA, August 2013. Association for Computing Machinery (ACM).
[44]
V. Srikumar, D. Roth. Modeling Semantic Relations Expressed by Prepositions. Transactions of the ACL (TACL) 1, pages 231--242, 2013.
[45]
F. Suchanek, M. Sozio, G. Weikum. Sofie: A self-organizing framework for information extraction. In International World Wide Web Conference (WWW), pages 631--640, 2009.
[46]
M. Surdeanu, J. Heng. Overview of the English slot filling track at the TAC2014 knowledge base population evaluation. In Text Analysis Conference Knowledge Base Population Workshop (TAC-KBP), 2014.
[47]
G. Szarvas, V. Vincze, R. Farkas, G. Mora, I. Gurevych. Cross-genre and cross-domain detection of semantic uncertainty. Computational Linguistics, 38(2):335--367, 2012.
[48]
P. P. Talukdar, D. Wijaya, T. Mitchell. Coupled temporal scoping of relational facts. In ACM International Conference on Web Search and Data Mining (WSDM), pages 73--82, 2012.
[49]
M. Valenzuela, V. Ha, O. Etzioni. Identifying meaningful citations. In Workshop on Scholarly Big Data at AAAI, 2015.
[50]
S. Van Landeghem, J. Börne, C.-H. Wei, K. Hakala, S. Pyysalo, S. Ananiadou, H.-Y. Kao, Z. Lu, T. Salakoski, Y. Van de Peer, F. Ginter. Large-scale event extraction from literature with multi-level gene normalization. PLoS ONE, 8(4):1--12, 2013.
[51]
D. Vrandecic, M. Krötzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10):78--85, 2014.
[52]
Y. Wang, B. Yang, L. Qu, M. Spaniol, G. Weikum. Harvesting facts from textual web sources by constrained label propagation. In ACM International Conference on Information and Knowledge Management (CIKM), pages 837--846, 2011.
[53]
C. Wang, J. Fan. Medical Relation Extraction with Manifold Models In Annual Meeting of the Association for Computational Linguistics (ACL), pages 828--838, 2014.

Cited By

View all
  • (2023)Active Learning for Cross-sentence N-ary Relation ExtractionInformation Sciences10.1016/j.ins.2023.119328(119328)Online publication date: Jun-2023
  • (2022)What a Publication Tells You—Benefits of Narrative Information Access in Digital LibrariesProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530928(1-8)Online publication date: 20-Jun-2022
  • (2022)The Advancement of Biomedical Relation Extraction through Cloud Computing and Big Data Analysis2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI55101.2022.9832057(755-760)Online publication date: 27-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 10 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distant supervision
  2. health
  3. higher-arity relation extraction
  4. knowledge base construction
  5. knowledge graphs
  6. partial facts
  7. text-based knowledge harvesting
  8. tree pattern learning

Qualifiers

  • Research-article

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)178
  • Downloads (Last 6 weeks)16
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Active Learning for Cross-sentence N-ary Relation ExtractionInformation Sciences10.1016/j.ins.2023.119328(119328)Online publication date: Jun-2023
  • (2022)What a Publication Tells You—Benefits of Narrative Information Access in Digital LibrariesProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530928(1-8)Online publication date: 20-Jun-2022
  • (2022)The Advancement of Biomedical Relation Extraction through Cloud Computing and Big Data Analysis2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI55101.2022.9832057(755-760)Online publication date: 27-May-2022
  • (2022)HYPER2: Hyperbolic embedding for hyper-relational link predictionNeurocomputing10.1016/j.neucom.2022.04.026492(440-451)Online publication date: Jul-2022
  • (2021)Nested relation extraction with iterative neural networkFrontiers of Computer Science10.1007/s11704-020-9420-615:3Online publication date: 16-Jan-2021
  • (2021)Diagnosis Ranking with Knowledge Graph Convolutional NetworksAdvances in Information Retrieval10.1007/978-3-030-72113-8_24(359-374)Online publication date: 27-Mar-2021
  • (2020)FoodKG: A Tool to Enrich Knowledge Graphs Using Machine Learning TechniquesFrontiers in Big Data10.3389/fdata.2020.000123Online publication date: 29-Apr-2020
  • (2020)Context-Compatible Information Fusion for Scientific Knowledge GraphsDigital Libraries for Open Knowledge10.1007/978-3-030-54956-5_3(33-47)Online publication date: 17-Aug-2020
  • (2019)Nested Relation Extraction with Iterative Neural NetworkProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358003(1001-1010)Online publication date: 3-Nov-2019
  • (2019)Auto-completion for Data Cells in Relational TablesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357932(761-770)Online publication date: 3-Nov-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media