Abstract
The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
See http://stlab.istc.cnr.it/stlab/WikipediaOntology/. Throughout this paper, we use the prefix dul for types of this ontology.
- 5.
- 6.
Abbreviations in Listing 1.1: ADJ = adjective, CD = cardinal number.
- 7.
The complete grammar can be found in the projects source code repository.
- 8.
The rdfs prefix stands for http://www.w3.org/2000/01/rdf-schema while the prefix ex could stay for every user defined vocabulary, e.g., http://example.com/.
- 9.
This mapping can be found inside the git repository of the project at https://github.com/AKSW/Cetus/blob/master/DOLCE_YAGO_links.nt.
- 10.
Throughout this paper, we use the prefix yago for http://yago-knowledge.org/resource/.
- 11.
The results of the challenge can be found at https://github.com/anuzzolese/oke-challenge#results.
References
Baldridge, J.: The opennlp project (2005)
Consoli, S., Reforgiato, D.: Using fred for named entity resolution, linking and typing for knowledge base population. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015). Springer International Publishing, Switzerland (2015)
Gao, J., Mazumdar, S.: Exploiting linked open data to uncover entity type. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 51–62. Springer International Publishing, Switzerland (2015)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING (1992)
Plu, G.R.J., Troncy, R.: An hybrid approach for entity recognition and linking. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 28–39. Springer International Publishing, Switzerland (2015)
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 885–916 (2013)
Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual Wikipedias. In: CIDR (2014)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Nadeau, D.: Balie-baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical report, University of Ottawa (2005)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 147–155. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems (NIPS 2004), November 2004
Speck, R., Ngonga Ngomo, A.-C.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 519–534. Springer, Heidelberg (2014)
Usbeck, R., Röder, M., Ngomo, A.-C.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW conference (2015)
Acknowledgements
This work has been supported by the FP7 project GeoKnow (GA No. 318159) and the BMWI Project SAKE (Project No. 01MD15006E).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Röder, M., Usbeck, R., Speck, R., Ngomo, AC.N. (2015). CETUS – A Baseline Approach to Type Extraction. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-25518-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25517-0
Online ISBN: 978-3-319-25518-7
eBook Packages: Computer ScienceComputer Science (R0)