research-article

Free access

HighLife: Higher-arity Fact Harvesting

Authors:

Gerhard WeikumAuthors Info & Claims

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 1013 - 1022

https://doi.org/10.1145/3178876.3186000

Published: 10 April 2018 Publication History

All formats PDF

Abstract

Text-based knowledge extraction methods for populating knowledge bases have focused on binary facts: relationships between two entities. However, in advanced domains such as health, it is often crucial to consider ternary and higher-arity relations. An example is to capture which drug is used for which disease at which dosage (e.g. 2.5 mg/day) for which kinds of patients (e.g., children vs. adults). In this work, we present an approach to harvest higher-arity facts from textual sources. Our method is distantly supervised by seed facts, and uses the fact-pattern duality principle to gather fact candidates with high recall. For high precision, we devise a constraint-based reasoning method to eliminate false candidates. A major novelty is in coping with the difficulty that higher-arity facts are often expressed only partially in texts and strewn across multiple sources. For example, one sentence may refer to a drug, a disease and a group of patients, whereas another sentence talks about the drug, its dosage and the target group without mentioning the disease. Our methods cope well with such partially observed facts, at both pattern-learning and constraint-reasoning stages. Experiments with health-related documents and with news articles demonstrate the viability of our method.

References

[1]

C.F. Baker. FrameNet, current collaborations and future goals. Language Resources and Evaluation 46(2): 269--286, 2012.

Digital Library

[2]

H. Bast, B. Buchhold. An index for efficient semantic full-text search. In ACM Conference on Information and Knowledge Management (CIKM), pages 369--378, 2013.

Digital Library

[3]

S. L. Berrahou, P. Buche, J. Dibie, M. Roche. Xart system: Discovering and extracting correlated arguments of n-ary relations from text. In International Conference on Web Intelligence, Mining and Semantics (WIMS), pages 8:1--8:12, 2016.%

Digital Library

[4]

% J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and T. Salakoski,% Extracting Contextualized Complex Biological Events with Rich Graph-% Based Feature Sets.% Computational Intelligence, 27:541--557, 2011.

[5]

R. Brachman, H. Levesque. Knowledge Representation and Reasoning. The Morgan Kaufmann Series in Artificial Intelligence Series. Morgan Kaufmann, 2004.

Digital Library

[6]

S. Brin. Extracting patterns and relations from the World Wide Web. In International Workshop on The WorldWide Web and Databases (WebDB), pages 172--183, 1998.

Digital Library

[7]

R. C. Bunescu, R. J. Mooney. A shortest path dependency kernel for relation extraction. In Conference on Human Language Technology and Empirical Methods in Natural Language Processing (EMNLP-HLT), pages 724--731, 2005.

Digital Library

[8]

A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka, Jr., T. M. Mitchell. Coupled semi-supervised learning for information extraction. In International Conference on Web Search and Data Mining (WSDM), pages 101--110, 2010.

Digital Library

[9]

Y. Chi, Y. Yang, R. R. Muntz. Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and Information Systems, 8(2):203--234, 2005.

Digital Library

[10]

J. Clarke, V. Srikumar, M. Sammons, D. Roth. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In International Conference on Language Resources and Evaluation (LREC:) pages 3276--3283, 2012.

[11]

L. Del Corro, R. Gemulla. Clausie: Clause-based open information extraction. In International Conference on World Wide Web (WWW), pages 355--366, 2013.

Digital Library

[12]

G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel, R. M. Weischedel. The Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation In Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation (LREC), 2004.

[13]

M. Dylla, I. Miliaraki, M. Theobald. A temporal-probabilistic database model for information extraction. VLDB (Very Large Data Bases) Endowment, 6(14):1810--1821, 2013.

Digital Library

[14]

P. Ernst, A. Siu, D. Milchevski, J. Hoffart, G. Weikum. DeepLife: An entity-aware search, analytics and exploration platform for health and life sciences. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 1017--1024, 2016.

[15]

P. Ernst, A. Siu, G. Weikum. KnowLife: A versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinformatics, 16(1):1--13, 2015.

[16]

G. Garrido, A. Pe nas, B. Cabaleiro, A. Rodrigo. Temporally anchored relation extraction. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 107--116, 2012.

Digital Library

[17]

D. Gildea, D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245--288, 2002.

Digital Library

[18]

J. Hoffart, D. Milchevski, G. Weikum. Stics: Searching with strings, things, and cats. In International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR), pages 1247--1248, 2014.

Digital Library

[19]

J. Hoffart, F. M. Suchanek, K. Berberich, G. Weikum. YAGO2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28 -- 61, 2013.

Digital Library

[20]

J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum. Robust disambiguation of named entities in text. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 782--792, 2011.

Digital Library

[21]

S. Krause, L. Hennig, A. Moro, D. Weissenborn, F. Xu, H. Uszkoreit, R. Navigli. Sar-graphs: A language resource connecting linguistic knowledge with semantic relations from knowledge graphs. Web Semantics: Science, Services and Agents on the World Wide Web, 37, 38:112 -- 131, 2016.

Digital Library

[22]

S. Krause, H. Li, H. Uszkoreit, F. Xu. Large-scale learning of relation-extraction rules with distant supervision from the web. In International Semantic Web Conference (ISWC), pages 263--278, 2012.

Digital Library

[23]

E. Kuzey, J. Vreeken, G. Weikum. A fresh look on knowledge bases: Distilling named events from news. In ACM International Conference on Information and Knowledge Management (CIKM), pages 1689--1698, 2014.

Digital Library

[24]

E. Kuzey, G. Weikum. Extraction of temporal facts and events from wikipedia. In Temporal Web Analytics Workshop (TempWeb), pages 25--32, 2012.

Digital Library

[25]

H. Li, S. Krause, F. Xu, A. Moro, H. Uszkoreit, R. Navigli. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning In International Conference on Agents and Artificial Intelligence (ICAART), Volume 2, 2015.

Digital Library

[26]

H. Liu, L. Hunter, V. Keselj, K. Verspoor. Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS ONE, 8(4):1--16, 2013.

[27]

V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, T. M. Tran. Consistent selectivity estimation via maximum entropy. The VLDB (Very Large Data Bases) Journal, 16(1):55--76, 2007.

Digital Library

[28]

Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni. Open language learning for information extraction. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 523--534, 2012.

Digital Library

[29]

D. McClosky, S. Riedel, M. Surdeanu, A. McCallum, C. D. Manning. Combining joint models for biomedical event extraction. BMC Bioinformatics, 13(11):1--12, 2012.

[30]

R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin, P. White. Simple algorithms for complex relation extraction with applications to biomedical ie. In Annual Meeting on Association for Computational Linguistics (ACL), pages 491--498, 2005.

Digital Library

[31]

F. Mesquita, J. Schmidek, D. Barbosa. Effectiveness and efficiency of open relation extraction. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 447--457, 2013.

[32]

M. Mintz, S. Bills, R. Snow, D. Jurafsky. Distant supervision for relation extraction without labeled data. In Annual Meeting on Association for Computational Linguistics (ACL), pages 1003--1011, 2009.

Digital Library

[33]

M. Miwa, P. Thompson, J. McNaught, D. B. Kell, S. Ananiadou. Extracting semantically enriched events from biomedical literature. BMC Bioinformatics, 13(1):1--24, 2012.

[34]

A. Moschitti. Making tree kernels practical for natural language learning. In European Chapter of the Association for Computational Linguistics (EACL), pages 113--120, 2006.

[35]

N. Nakashole, M. Theobald, G. Weikum. Scalable knowledge harvesting with high precision and high recall. In International Conference on Web Search and Data Mining (WSDM), pages 227--236, 2011.

Digital Library

[36]

M. Palmer, D. Gildea, P. Kingsbury The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics 28, 31(1):71--106, 2005.

Digital Library

[37]

M. Palmer, D. Gildea, N. Xue Semantic role labeling Synthesis Lectures on Human Language Technologies, 3(1):1--103, 2011.

Digital Library

[38]

N. Peng, H. Poon, C. Quirk, K. Toutanova, W. Yih. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. Transactions of the ACL (TACL) 5:101--115, 2017.

[39]

V. Punyakanok, D. Roth, W. Yih. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics 28, 34(2):257--287, 2008.

Digital Library

[40]

L. Ratinov, D. Roth. Design Challenges and Misconceptions in Named Entity Recognition. In Conference on Computational Natural Language Learning (CoNLL), pages 147--155, 2009.%

Digital Library

[41]

%J. Ruppenhofer, M. Ellsworth, M. Petruck, C. Johnson, C. Baker, J. Scheffczyk. FrameNet II: Extended Theory and Practice. framenet2.icsi.berkeley.edu/docs/r1.7/book.pdf, 2016.

[42]

D. Shahaf, C. Guestrin. Connecting two (or less) dots: Discovering structure in news articles. ACM Transactions on Knowledge Discovery from Data, 5(4):24:1--24:31, 2012.%

Digital Library

[43]

%A. Siu, D. B. Nguyen, and G. Weikum. Fast entity recognition in biomedical text.% In Workshop on Data Mining for Healthcare (DMH) at the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2013, Chicago, USA, August 2013. Association for Computing Machinery (ACM).

[44]

V. Srikumar, D. Roth. Modeling Semantic Relations Expressed by Prepositions. Transactions of the ACL (TACL) 1, pages 231--242, 2013.

[45]

F. Suchanek, M. Sozio, G. Weikum. Sofie: A self-organizing framework for information extraction. In International World Wide Web Conference (WWW), pages 631--640, 2009.

Digital Library

[46]

M. Surdeanu, J. Heng. Overview of the English slot filling track at the TAC2014 knowledge base population evaluation. In Text Analysis Conference Knowledge Base Population Workshop (TAC-KBP), 2014.

[47]

G. Szarvas, V. Vincze, R. Farkas, G. Mora, I. Gurevych. Cross-genre and cross-domain detection of semantic uncertainty. Computational Linguistics, 38(2):335--367, 2012.

Digital Library

[48]

P. P. Talukdar, D. Wijaya, T. Mitchell. Coupled temporal scoping of relational facts. In ACM International Conference on Web Search and Data Mining (WSDM), pages 73--82, 2012.

Digital Library

[49]

M. Valenzuela, V. Ha, O. Etzioni. Identifying meaningful citations. In Workshop on Scholarly Big Data at AAAI, 2015.

[50]

S. Van Landeghem, J. Börne, C.-H. Wei, K. Hakala, S. Pyysalo, S. Ananiadou, H.-Y. Kao, Z. Lu, T. Salakoski, Y. Van de Peer, F. Ginter. Large-scale event extraction from literature with multi-level gene normalization. PLoS ONE, 8(4):1--12, 2013.

[51]

D. Vrandecic, M. Krötzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10):78--85, 2014.

Digital Library

[52]

Y. Wang, B. Yang, L. Qu, M. Spaniol, G. Weikum. Harvesting facts from textual web sources by constrained label propagation. In ACM International Conference on Information and Knowledge Management (CIKM), pages 837--846, 2011.

Digital Library

[53]

C. Wang, J. Fan. Medical Relation Extraction with Manifold Models In Annual Meeting of the Association for Computational Linguistics (ACL), pages 828--838, 2014.

Cited By

Seo SOh BJeoung JKim DLee KShin DLee Y(2023)Active Learning for Cross-sentence N-ary Relation ExtractionInformation Sciences10.1016/j.ins.2023.119328(119328)Online publication date: Jun-2023
https://doi.org/10.1016/j.ins.2023.119328
Kroll HPlötzky FPirklbauer JBalke WAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)What a Publication Tells You—Benefits of Narrative Information Access in Digital LibrariesProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530928(1-8)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530928
Wang Z(2022)The Advancement of Biomedical Relation Extraction through Cloud Computing and Big Data Analysis2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI55101.2022.9832057(755-760)Online publication date: 27-May-2022
https://doi.org/10.1109/ICETCI55101.2022.9832057
Show More Cited By

Index Terms

HighLife: Higher-arity Fact Harvesting
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Understanding how primary care clinicians make sense of chronic pain

Chronic pain leads to reduced quality of life for patients, and strains health systems worldwide. In the US and some other countries, the complexities of caring for chronic pain are exacerbated by individual and public health risks associated with ...
Experiences in mHealth for chronic disease management in 4 countries
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

This paper describes mHealth applications to deal with Non Communicable Diseases in North and Latin America: In Chile, a project focused on Diabetes Mellitus type 2; In the United States, Honduras, and Mexico, projects focused in diabetes, heart failure,...
Methadone Maintenance and HIV Prevention: A Cost-Effectiveness Analysis

We assess the cost-effectiveness of maintenance treatment for heroin addiction, with emphasis on its role in preventing HIV infection. The analysis is based on a dynamic compartmental model of the HIV epidemic among a population of adults, ages 18 to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '18: Proceedings of the 2018 World Wide Web Conference

April 2018

2000 pages

ISBN:9781450356398

General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 10 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
2,633
Total Downloads

Downloads (Last 12 months)178
Downloads (Last 6 weeks)16

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Seo SOh BJeoung JKim DLee KShin DLee Y(2023)Active Learning for Cross-sentence N-ary Relation ExtractionInformation Sciences10.1016/j.ins.2023.119328(119328)Online publication date: Jun-2023
https://doi.org/10.1016/j.ins.2023.119328
Kroll HPlötzky FPirklbauer JBalke WAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)What a Publication Tells You—Benefits of Narrative Information Access in Digital LibrariesProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530928(1-8)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530928
Wang Z(2022)The Advancement of Biomedical Relation Extraction through Cloud Computing and Big Data Analysis2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI55101.2022.9832057(755-760)Online publication date: 27-May-2022
https://doi.org/10.1109/ICETCI55101.2022.9832057
Yan SZhang ZSun XXu GJin LLi S(2022)HYPER2: Hyperbolic embedding for hyper-relational link predictionNeurocomputing10.1016/j.neucom.2022.04.026492(440-451)Online publication date: Jul-2022
https://doi.org/10.1016/j.neucom.2022.04.026
Cao YChen DXu ZLi HLuo P(2021)Nested relation extraction with iterative neural networkFrontiers of Computer Science10.1007/s11704-020-9420-615:3Online publication date: 16-Jan-2021
https://doi.org/10.1007/s11704-020-9420-6
Liu BZuccon GHua WChen W(2021)Diagnosis Ranking with Knowledge Graph Convolutional NetworksAdvances in Information Retrieval10.1007/978-3-030-72113-8_24(359-374)Online publication date: 27-Mar-2021
https://doi.org/10.1007/978-3-030-72113-8_24
Gharibi MZachariah ARao P(2020)FoodKG: A Tool to Enrich Knowledge Graphs Using Machine Learning TechniquesFrontiers in Big Data10.3389/fdata.2020.000123Online publication date: 29-Apr-2020
https://doi.org/10.3389/fdata.2020.00012
Kroll HKalo JNagel DMennicke SBalke W(2020)Context-Compatible Information Fusion for Scientific Knowledge GraphsDigital Libraries for Open Knowledge10.1007/978-3-030-54956-5_3(33-47)Online publication date: 17-Aug-2020
https://doi.org/10.1007/978-3-030-54956-5_3
Cao YChen DLi HLuo PZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Nested Relation Extraction with Iterative Neural NetworkProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358003(1001-1010)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358003
Zhang SBalog KZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Auto-completion for Data Cells in Relational TablesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357932(761-770)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3357932
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten