chapter

Knowledge Harvesting: Achievements and Challenges

Authors:

Gerhard Weikum,

Johannes Hoffart,

Fabian SuchanekAuthors Info & Claims

Computing and Software Science

Pages 217 - 235

https://doi.org/10.1007/978-3-319-91908-9_13

Published: 11 March 2022 Publication History

Abstract

This article gives an overview on knowledge harvesting: automatically constructing large high-quality knowledge bases from Internet sources. The first part reviews key principles and best-practice methods. The second part points out open challenges for future research.

References

[1]

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: ACM DL (2000)

[2]

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC - 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

Digital Library

[3]

Banko, M., et al.: Open information extraction from the web. In: IJCAI (2007)

[4]

Bast, H., Buchhold, B., Haussmann, E.: Semantic search on text and knowledge bases. Found. Trends Inf. Retrieval 10(2–3) (2016)

[5]

Bollacker, K., et al.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)

[6]

Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999). https://doi.org/10.1007/10704656_11

[7]

Carlson, A.J., et al.: Toward an architecture for never-ending language learning. In: AAAI (2010)

[8]

Chen, X., Shrivastava, A., Gupta, A.: NEIL: extracting visual knowledge from web data. In: ICCV (2013)

[9]

Chiticariu, L., Li, Y., Reiss, F.: Transparent machine learning for information extraction: state-of-the-art and the future. In: EMNLP (Tutorial) (2015)

[10]

Craven, M., et al.: Learning to construct knowledge bases from the world wide web. Art. Intell. 118(1) (2000)

[11]

Dalvi, N., et al.: A web of concepts. In: PODS (2009)

[12]

Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW 2013 (2013)

[13]

Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

[14]

Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, San Rafael (2009)

[15]

Dong, X.L., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD (2014)

Digital Library

[16]

Dong, X.L., et al.: Knowledge-based trust: estimating the trustworthiness of web sources. PVLDB 8(9) (2015)

[17]

Ernst, P., et al.: KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform. 16(157) (2015)

[18]

Etzioni, O., et al.: Unsupervised named-entity extraction from the web: an experimental study. Art. Intell. 165(1) (2005)

[19]

Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

[20]

Galarraga, L., et al.: Canonicalizing open knowledge bases. In: CIKM (2014)

[21]

Galarraga, L., et al.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6) (2015)

[22]

Galarraga, L., et al.: Predicting completeness in knowledge bases. In: WSDM (2017)

[23]

Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: COLING (1992)

Digital Library

[24]

Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, San Rafael (2011)

Digital Library

[25]

Hoffart, J., et al.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Art. Intell. 194 (2013)

[26]

Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: WWW 2014 (2014)

[27]

Hoffart, J., et al.: The knowledge awakens: keeping knowledge bases fresh with emerging entities. In: WWW 2016 (2016)

[28]

Ferrucci, D.A.: “This is Watson”. IBM J. Res. Dev. 56(3/4) (2012). Special Issue

[29]

Kobren, A., et al.: Getting more for less: optimized crowdsourcing with dynamic tasks and goals. In: WWW 2015 (2015)

[30]

Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)

Digital Library

[31]

Kozareva, Z., Hovy, E.H.: Learning arguments and supertypes of semantic relations using recursive patterns. In: ACL (2010)

Digital Library

[32]

Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 263–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_17

[33]

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521 (2015)

[34]

Lenat, D., Feigenbaum, E.: On the thresholds of knowledge. Art. Intell. 47(1) (1991)

[35]

Lenat, D.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11) (1995)

[36]

Li, Y., et al.: A survey on truth discovery. SIGKDD Explor. 17(2) (2015)

[37]

Ling, X., Weld, D.: Temporal information extraction. In: AAAI (2010)

[38]

Ling, X., Singh, S., Weld, D.: Design challenges for entity linking. TACL 3 (2015)

[39]

Mausam, et al.: Open language learning for information extraction. In: EMNLP-CoNLL (2012)

[40]

Mintz, M., et al.: Distant supervision for relation extraction without labeled data. In: ACL/IJCNLP (2009)

[41]

Mitchell, T., et al.: Never-ending learning. In: AAAI (2015)

[42]

Mukherjee, S., Weikum, G., Danescu-Niculescu-Mizil, C.: People on drugs: credibility of user statements in health communities. In: KDD (2014)

[43]

Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM (2011)

[44]

Navigli, R., Ponzetto, S.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Art. Intell. 193 (2012)

[45]

Nickel, M., et al.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1) (2016)

[46]

Nie, Z., Wen, J.-R., Ma, W.-Y.: Statistical entity extraction from the web. Proc. IEEE 100(9) (2012)

[47]

Pasca, M.: Open-domain fine-grained class extraction from web search queries. In: EMNLP (2013)

[48]

Pasternack, J., Roth, D.: Latent credibility analysis. In: WWW 2013 (2013)

[49]

Popat, K., et al.: Where the truth lies: explaining the credibility of emerging claims on the web and social media. In: WWW 2017 (2017)

[50]

Ponzetto, S., Strube, M.: Deriving a large-scale taxonomy from wikipedia. In: AAAI (2007)

[51]

Preda, N., et al.: Active knowledge: dynamically enriching RDF knowledge bases by web services. In: SIGMOD (2010)

[52]

Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_34

[53]

Rebele, T., Suchanek, F., Hoffart, J., Biega, J., Kuzey, E., Weikum, G.: YAGO: a multilingual knowledge base from wikipedia, wordnet, and geonames. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 177–185. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_19

[54]

Rohrbach, A., et al.: A dataset for movie description. In: CVPR (2015)

[55]

Sarawagi, S.: Information extraction. Found. Trends Databases 1(3) (2008)

[56]

Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2) (2015)

[57]

Shin, J., et al.: Incremental knowledge base construction using deepdive. PVLDB 8(11) (2015)

[58]

Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC (2012)

[59]

Staab, S., Studer, R.: Handbook on Ontologies. Springer, Heidelberg (2009)

[60]

Suchanek, F., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW 2007 (2007)

[61]

Suchanek, F., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: WWW 2009 (2009)

[62]

Talaika, A., et al.: IBEX: harvesting entities from the web using unique identifiers. In: WebDB 2015 (2015)

[63]

Talukdar, P., Wijaya, D., Mitchell, T.: Coupled temporal scoping of relational facts. In: WSDM 2012 (2012)

[64]

Tandon, N., et al.: WebChild: harvesting and organizing commonsense knowledge from the web. In: WSDM (2014)

[65]

Tandon, N., et al.: Knowlywood: mining activity knowledge from hollywood narratives. In: CIKM (2015)

[66]

Tandon, N., et al.: Commonsense in parts: mining part-whole relations from the web and image tags. In: AAAI 2016 (2016)

[67]

Venetis, P., et al.: Recovering semantics of tables on the web. PVLDB 4(9) (2011)

[68]

Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10) (2014)

[69]

Wang, R., Cohen, W.: Iterative set expansion of named entities using the web. In: ICDM (2008)

[70]

Wang, Z., et al.: XLore: a large-scale English-Chinese bilingual knowledge graph. In: ISWC (2013)

[71]

Wang, Y., et al.: Coupling label propagation and constraints for temporal fact extraction. In: ACL (2012)

[72]

Wu, F., Hoffmann, R., Weld, D.: Information extraction from wikipedia: moving down the long tail. In: KDD (2008)

[73]

Wu, W., et al.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD (2012)

[74]

Yao, L., et al.: Structured relation discovery using generative models. In: EMNLP (2011)

[75]

Zhu, J., et al.: StatSnowball: a statistical approach to extracting entity relationships. In: WWW 2009 (2009)

Index Terms

Knowledge Harvesting: Achievements and Challenges

Index terms have been assigned to the content through auto-classification.

Recommendations

Knowledge Graphs: Opportunities and Challenges
Abstract
With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of ...
Advances in Energy Harvesting Communications: Past, Present, and Future Challenges
Recent emphasis on green communications has generated great interest in the investigations of energy harvesting communications and networking. Energy harvesting from ambient energy sources can potentially reduce the dependence on the supply of grid or ...
Knowledge harvesting in the big-data era
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources have enabled the automatic construction of very large knowledge bases. Endeavors of this kind include ...

Comments

Information & Contributors

Information

Published In

cover image Guide books

Computing and Software Science

603 pages

ISBN:978-3-319-91907-2

DOI:10.1007/978-3-319-91908-9

Editors:
Bernhard Steffen
Technical University of Dortmund, Dortmund, Germany
,
Gerhard Woeginger
RWTH Aachen, Aachen, Germany

Copyright © 2019 Springer Nature Switzerland AG.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 11 March 2022

Qualifiers

Chapter

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Chapter

Media

Figures

Other

Tables

View Table of Contents