Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-319-91908-9_13guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Knowledge Harvesting: Achievements and Challenges

Published: 11 March 2022 Publication History

Abstract

This article gives an overview on knowledge harvesting: automatically constructing large high-quality knowledge bases from Internet sources. The first part reviews key principles and best-practice methods. The second part points out open challenges for future research.

References

[1]
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: ACM DL (2000)
[2]
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC - 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
[3]
Banko, M., et al.: Open information extraction from the web. In: IJCAI (2007)
[4]
Bast, H., Buchhold, B., Haussmann, E.: Semantic search on text and knowledge bases. Found. Trends Inf. Retrieval 10(2–3) (2016)
[5]
Bollacker, K., et al.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)
[6]
Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999). https://doi.org/10.1007/10704656_11
[7]
Carlson, A.J., et al.: Toward an architecture for never-ending language learning. In: AAAI (2010)
[8]
Chen, X., Shrivastava, A., Gupta, A.: NEIL: extracting visual knowledge from web data. In: ICCV (2013)
[9]
Chiticariu, L., Li, Y., Reiss, F.: Transparent machine learning for information extraction: state-of-the-art and the future. In: EMNLP (Tutorial) (2015)
[10]
Craven, M., et al.: Learning to construct knowledge bases from the world wide web. Art. Intell. 118(1) (2000)
[11]
Dalvi, N., et al.: A web of concepts. In: PODS (2009)
[12]
Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW 2013 (2013)
[13]
Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
[14]
Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, San Rafael (2009)
[15]
Dong, X.L., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD (2014)
[16]
Dong, X.L., et al.: Knowledge-based trust: estimating the trustworthiness of web sources. PVLDB 8(9) (2015)
[17]
Ernst, P., et al.: KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform. 16(157) (2015)
[18]
Etzioni, O., et al.: Unsupervised named-entity extraction from the web: an experimental study. Art. Intell. 165(1) (2005)
[19]
Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
[20]
Galarraga, L., et al.: Canonicalizing open knowledge bases. In: CIKM (2014)
[21]
Galarraga, L., et al.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6) (2015)
[22]
Galarraga, L., et al.: Predicting completeness in knowledge bases. In: WSDM (2017)
[23]
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: COLING (1992)
[24]
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, San Rafael (2011)
[25]
Hoffart, J., et al.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Art. Intell. 194 (2013)
[26]
Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: WWW 2014 (2014)
[27]
Hoffart, J., et al.: The knowledge awakens: keeping knowledge bases fresh with emerging entities. In: WWW 2016 (2016)
[28]
Ferrucci, D.A.: “This is Watson”. IBM J. Res. Dev. 56(3/4) (2012). Special Issue
[29]
Kobren, A., et al.: Getting more for less: optimized crowdsourcing with dynamic tasks and goals. In: WWW 2015 (2015)
[30]
Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)
[31]
Kozareva, Z., Hovy, E.H.: Learning arguments and supertypes of semantic relations using recursive patterns. In: ACL (2010)
[32]
Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 263–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_17
[33]
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521 (2015)
[34]
Lenat, D., Feigenbaum, E.: On the thresholds of knowledge. Art. Intell. 47(1) (1991)
[35]
Lenat, D.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11) (1995)
[36]
Li, Y., et al.: A survey on truth discovery. SIGKDD Explor. 17(2) (2015)
[37]
Ling, X., Weld, D.: Temporal information extraction. In: AAAI (2010)
[38]
Ling, X., Singh, S., Weld, D.: Design challenges for entity linking. TACL 3 (2015)
[39]
Mausam, et al.: Open language learning for information extraction. In: EMNLP-CoNLL (2012)
[40]
Mintz, M., et al.: Distant supervision for relation extraction without labeled data. In: ACL/IJCNLP (2009)
[41]
Mitchell, T., et al.: Never-ending learning. In: AAAI (2015)
[42]
Mukherjee, S., Weikum, G., Danescu-Niculescu-Mizil, C.: People on drugs: credibility of user statements in health communities. In: KDD (2014)
[43]
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM (2011)
[44]
Navigli, R., Ponzetto, S.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Art. Intell. 193 (2012)
[45]
Nickel, M., et al.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1) (2016)
[46]
Nie, Z., Wen, J.-R., Ma, W.-Y.: Statistical entity extraction from the web. Proc. IEEE 100(9) (2012)
[47]
Pasca, M.: Open-domain fine-grained class extraction from web search queries. In: EMNLP (2013)
[48]
Pasternack, J., Roth, D.: Latent credibility analysis. In: WWW 2013 (2013)
[49]
Popat, K., et al.: Where the truth lies: explaining the credibility of emerging claims on the web and social media. In: WWW 2017 (2017)
[50]
Ponzetto, S., Strube, M.: Deriving a large-scale taxonomy from wikipedia. In: AAAI (2007)
[51]
Preda, N., et al.: Active knowledge: dynamically enriching RDF knowledge bases by web services. In: SIGMOD (2010)
[52]
Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_34
[53]
Rebele, T., Suchanek, F., Hoffart, J., Biega, J., Kuzey, E., Weikum, G.: YAGO: a multilingual knowledge base from wikipedia, wordnet, and geonames. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 177–185. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_19
[54]
Rohrbach, A., et al.: A dataset for movie description. In: CVPR (2015)
[55]
Sarawagi, S.: Information extraction. Found. Trends Databases 1(3) (2008)
[56]
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2) (2015)
[57]
Shin, J., et al.: Incremental knowledge base construction using deepdive. PVLDB 8(11) (2015)
[58]
Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC (2012)
[59]
Staab, S., Studer, R.: Handbook on Ontologies. Springer, Heidelberg (2009)
[60]
Suchanek, F., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW 2007 (2007)
[61]
Suchanek, F., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: WWW 2009 (2009)
[62]
Talaika, A., et al.: IBEX: harvesting entities from the web using unique identifiers. In: WebDB 2015 (2015)
[63]
Talukdar, P., Wijaya, D., Mitchell, T.: Coupled temporal scoping of relational facts. In: WSDM 2012 (2012)
[64]
Tandon, N., et al.: WebChild: harvesting and organizing commonsense knowledge from the web. In: WSDM (2014)
[65]
Tandon, N., et al.: Knowlywood: mining activity knowledge from hollywood narratives. In: CIKM (2015)
[66]
Tandon, N., et al.: Commonsense in parts: mining part-whole relations from the web and image tags. In: AAAI 2016 (2016)
[67]
Venetis, P., et al.: Recovering semantics of tables on the web. PVLDB 4(9) (2011)
[68]
Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10) (2014)
[69]
Wang, R., Cohen, W.: Iterative set expansion of named entities using the web. In: ICDM (2008)
[70]
Wang, Z., et al.: XLore: a large-scale English-Chinese bilingual knowledge graph. In: ISWC (2013)
[71]
Wang, Y., et al.: Coupling label propagation and constraints for temporal fact extraction. In: ACL (2012)
[72]
Wu, F., Hoffmann, R., Weld, D.: Information extraction from wikipedia: moving down the long tail. In: KDD (2008)
[73]
Wu, W., et al.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD (2012)
[74]
Yao, L., et al.: Structured relation discovery using generative models. In: EMNLP (2011)
[75]
Zhu, J., et al.: StatSnowball: a statistical approach to extracting entity relationships. In: WWW 2009 (2009)

Index Terms

  1. Knowledge Harvesting: Achievements and Challenges
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide books
        Computing and Software Science
        603 pages
        ISBN:978-3-319-91907-2
        DOI:10.1007/978-3-319-91908-9

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 11 March 2022

        Qualifiers

        • Chapter

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 16 Oct 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media