Abstract
Wikipedia has been the largest knowledge repository on the Web. However, most of the semantic knowledge in Wikipedia is documented in natural language, which is mostly only human readable and incomprehensible for computer processing. To establish the missing link from Wikipedia to semantic network, this paper proposes a relation discovery method, which can: (1) discover and characterize a large collection of relations from Wikipedia by exploiting the relation pattern regularity, the relation distribution regularity and the relation instance redundancy; and (2) annotate the hyperlinks between Wikipedia articles with the discovered semantic relations. Finally we discover 14,299 relations, 105,661 relation patterns and 5,214,175 relation instances from Wikipedia, and this will be a valuable resource for many NLP and AI tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM, New York (2000)
Amigo, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Ident. Common Mol. Subsequences 12, 461–486 (2009)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52
Baker, C.F., Charles, J.F., John, B.L.: The Berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 86–90. Association for Computational Linguistics, Stroudsburg (1998)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)
Bunescu, R., Mooney, R.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731. Association for Computational Linguistics, Stroudsburg (2005)
Brin, S.: Extracting patterns and relations from the world wide web. In: International Workshop on the World Wide Web and Databases, pp. 172–183 (1999)
Carlson, A., Betteridge, J., et al.: Toward an architecture for never-ending language learning. In: Proceedings of the Conference on Artificial Intelligence (AAAI 2010), p. 3. AAAI Press, Palo Alto (2010)
Chan, Y.S., Roth, D.: Exploiting syntactico-semantic structures for relation extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 551–560 (2011)
Chen, H., Benson, E., et al.: In-domain relation discovery with meta-constraints via posterior regularization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 530–540. Association for Computational Linguistics, Stroudsburg (2011)
De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report, Stanford University, pp. 338–345 (2008)
Doddington, G., et al.: The automatic content extraction (ACE) program–tasks, data, and evaluation. In: Proceedings of LREC (2004)
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)
Etzioni, O., Banko, M., et al.: Open information extraction from the web. Commun. ACM 51, 68–74 (2008)
Etzioni, O., et al.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM, New York (2004)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th International Conference on Computational Linguistics, pp. 466–471 (1996)
Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of EMNLP-CoNLL, pp. 105–115. Association for Computational Linguistics, Stroudsburg (2012)
Li, P., Jiang, J., et al.: Generating templates of entity summaries with an entity-aspect model and pattern mining. In: Proceedings of ACL, pp. 640–649. Association for Computational Linguistics, Stroudsburg (2010)
Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of Cyc. In: Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and its Applications to Knowledge Representation and Question Answering, pp. 44–49. AAAI Press, Palo Alto (2006)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)
Mintz, M., Bills, S., Snow, R., Jurafsky D.: Distant supervision for relation extraction without labeled data. In: Proceedings ACL-IJCNLP, pp. 1003—1011. Association for Computational Linguistics, Stroudsburg (2009)
Mohamed, T.P., Hruschka, J.E.R., et al.: Discovering relations between noun categories. In: Proceedings of EMNLP, pp. 1447–1455. Association for Computational Linguistics, Stroudsburg (2011)
Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of EMNLP, pp. 1135–1145 (2012)
Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th IJCAI, pp. 2083–2088. AAAI Press, Palo Alto (2009)
Suchanek, F.M., Kasneci, G., et al.: Yago: a large ontology from wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6, 203–217 (2008)
Teh, Y.W., Jordan, M.I., et al.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)
Wang, C., Kalyanpur, A., et al.: Relation extraction and scoring in DeepQA. IBM J. Res. Dev. 56, 9:1–9:12 (2012)
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of CIKM, pp. 41–50. ACM, New York (2007)
Yates, A., et al.: TextRunner: open information extraction on the web. In: Proceedings of HLT-NAACL, pp. 25–26. Association for Computational Linguistics, Stroudsburg (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Han, X., Song, X., Sun, L. (2016). Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds) Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data. CCKS 2016. Communications in Computer and Information Science, vol 650. Springer, Singapore. https://doi.org/10.1007/978-981-10-3168-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-10-3168-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3167-0
Online ISBN: 978-981-10-3168-7
eBook Packages: Computer ScienceComputer Science (R0)