Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

OntoILPER: an ontology- and inductive logic programming-based system to extract entities and relations from text

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) and relation extraction (RE) are two important subtasks in information extraction (IE). Most of the current learning methods for NER and RE rely on supervised machine learning techniques with more accurate results for NER than RE. This paper presents OntoILPER a system for extracting entity and relation instances from unstructured texts using ontology and inductive logic programming, a symbolic machine learning technique. OntoILPER uses the domain ontology and takes advantage of a higher expressive relational hypothesis space for representing examples whose structure is relevant to IE. It induces extraction rules that subsume examples of entities and relation instances from a specific graph-based model of sentence representation. Furthermore, OntoILPER enables the exploitation of the domain ontology and further background knowledge in the form of relational features. To evaluate OntoILPER, several experiments over the TREC corpus for both NER and RE tasks were conducted and the yielded results demonstrate its effectiveness in both tasks. This paper also provides a comparative assessment among OntoILPER and other NER and RE systems, showing that OntoILPER is very competitive on NER and outperforms the selected systems on RE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Horn clauses consist of first-order clauses containing at most one positive literal.

  2. ACE (2004). Automatic Content Extraction. Relation Detection and Characterization 2004 Evaluation. http://www.itl.nist.gov/iad/mig/tests/ace/2004.

  3. In an ontology, TBox statements describe a system in terms of a controlled vocabulary, or a set of classes and properties, whereas ABox is the assertional component, i.e., TBox-compliant statements about that vocabulary.

  4. Aleph Manual. http://www.cs.ox.ac.uk/activities/machinelearning/Aleph/aleph.

  5. Stanford CoreNLP Tools. http://nlp.stanford.edu/software/corenlp.shtml.

  6. Apache OpenNLP. The Apache Software Foundation. http://opennlp.apache.org.

  7. We have also experimented with 4-grams, but bi-grams and tri-grams achieved better results in our preliminary experiments.

  8. ProGolem ILP system runs on the YAP Prolog (http://www.dcc.fc.up.pt/~vsc/Yap).

  9. http://cogcomp.cs.illinois.edu/Data/ER/conll04.corp.

  10. LIBSVM. A library for Support Vector Machines. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  11. WordNet. A lexical database for English. https://wordnet.princeton.edu.

References

  1. Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T (2008) All-paths graph kernel for protein-protein interaction extraction with evaluation of cross corpus learning. BMC Bioinform. 9:S2

    Article  Google Scholar 

  2. Alicante A, Corazza A (2011) Barrier features for classification of semantic relations. In: Proceedings of the international conference recent advances in natural language processing (RANLP) 2011, Hissar, Bulgaria, pp 509–514

  3. Baader F, Horrocks I, Sattler U (2008) Description logics. Handbook of knowledge representation. Elsevier, Atlanta

    Google Scholar 

  4. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Boston

    Google Scholar 

  5. Björne J, Salakoski T (2015). TEES 2.2: Biomedical event extraction for diverse corpora. BMC Bioinform 16. Suppl 16 (2015): S4. PMC. Web. 1 Nov

  6. Brown M, Kros JF (2003) Data mining and the impact of missing data. Indu Manag Data Syst 103(8):611–621

    Article  Google Scholar 

  7. Byrd R, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. J Math Progr 134–1:127–155

    Article  MathSciNet  MATH  Google Scholar 

  8. Camacho R, Ramos R, Fonseca N (2014). AND Parallelism for ILP: the APIS system. In: Inductive logic programming: 23rd international conference, ILP (2013) Rio de Janeiro, Brazil, August 28–30, 2013. Revised Selected Papers. Springer, Berlin, pp 93–106

  9. Choi SP, Lee S, Jung H, Song S (2013) An intensive case study on kernel-based relation extraction. In: Proceedings of multimedia tools and applications, Springer, US, pp 1–27

  10. Choi SP, Jeong CH, Choi YS, Myaeng SH (2009) Relation extraction based on extended composite kernel using flat lexical features. JKIISE Softw Appl 36(8):642–652

    Google Scholar 

  11. Christensen J, Mausam, Soderland S, Etzioni O (2010) Semantic role labeling for open information extraction. In: Proceedings of the NAACL HLT, First international workshop on formalisms and methodology for learning by reading (FAM-LbR ’10), ACL, Stroudsburg, PA, USA, pp 52–60

  12. Ciaramita M, Altun Y (2006) Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP ’06), association for computational linguistics, Stroudsburg, PA, USA, pp 594–602

  13. De Marneffe M-C, Manning CD (2006) Stanford typed dependencies manual. Technical report. Department of Computer Science, Stanford University

  14. Dou D, Wang H, Liu H (2015) Semantic data mining: a survey of ontology-based approaches. In: IEEE international conference on semantic computing (ICSC), 2015, Anaheim, CA, pp 244–251

  15. Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Springer, Berlin

    Book  MATH  Google Scholar 

  16. Giuliano C, Lavelli A, Romano L (2007) Relation extraction and the influence of automatic NER. ACM Trans Speech Lang Process 5(1):2

    Article  Google Scholar 

  17. Gruber T (1993) Towards principles for the design of ontologies used for knowledge sharing. In: International workshop on formal ontology in conceptual analysis and knowledge representation, Kluwer Academic Publishers, Deventer, The Netherlands

  18. Gutierrez F, Dou D, Fickas S, Wimalasuriya D, Zong H (2015) A hybrid ontology-based information extraction system. J Inform Sci 2015:1–23

    Google Scholar 

  19. Hitzler P, Krötzsch M, Parsia B, Patel-Schneider PF, Rudolph S (2009) OWL 2 Web ontology language primer. W3C Work Draft. http://www.w3.org/TR/owl2-primer

  20. Horvath T, Paass G, Reichartz F, Wrobel S (2009) A logic-based approach to relation extraction from texts. In: De Raedt L (ed) Proceedings of the 19th international conference on inductive logic programming (ILP’09). Springer, Berlin, pp 34–48

  21. Jiang J (2012) Information extraction from text. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, Berlin, pp 11–41

    Chapter  Google Scholar 

  22. Jiang J, Guan Y, Zhao C (2015) WI-ENRE in CLEF eHealth evaluation lab 2015: clinical named entity recognition based on CRF. In: Conference and labs of the evaluation forum Toulouse, France, September 8–11, CLEF (working notes)

  23. Jiang J, Zhai CX (2007) A systematic exploration of the feature space for relation extraction. In: Annual conference of the North American chapter of the association for computational linguistics, NAACL-HLT’2007, Rochester, NY, USA, pp 113–120

  24. Karkaletsis V, Fragkou P, Petasis G, Iosif E (2011) Ontology based information extraction from text. In: Paliouras G et al (eds) Multimedia information extraction, LNAI 6050, pp 89–109

  25. Kate RJ, Mooney RJ (2010) Joint entity and relation extraction using card-pyramid parsing. In: Proceedings of the 14th conference on computational natural language learning (CoNLL-2010), Uppsala, Sweden, July, pp 203–212

  26. Kohavi R, John GH (1995) Automatic parameter selection by minimizing estimated error. In: 12th international conference on machine learning, San Francisco, Morgam Kaufman

  27. Lavrac N, Dzeroski S (1994) Inductive logic programming: techniques and applications. Ellis Horwood, New York

    MATH  Google Scholar 

  28. Lima R, Batista J, Ferreira R, Freitas F, Lins R, Simske S, Riss M (2014) Transforming graph-based sentence representations to alleviate overfitting in relation extraction. In: Proceedings of the 2014 ACM symposium on document engineering (DocEng ’14), ACM, New York, NY, USA, pp 53–62

  29. Lima R, Espinasse B, Freitas F (2015) Relation extraction from texts with symbolic rules induced by inductive logic programming. In: Proceedings of the IEEE international conference on tools with artificial intelligence, IEEE-ICTAI 2015, Vietri sul Mar, Italy, pp 194–201

  30. Lima R, Espinasse B, Oliveira H, Pentagrossa L, Freitas F (2013) Information extraction from the web: an ontology–based method using inductive logic programming. In: Proceeding of the IEEE international conference on tools with artificial intelligence, IEEE-ICTAI 2013, Washington DC, USA, pp 741–748

  31. Li M, Munkhdalai T, Yu X, Keun HR (2015) A novel approach for protein-named entity recognition and protein-protein interaction extraction. Math Probl Eng 2015:10

  32. Muggleton S (1991) Inductive logic programming. New Gener Comput 8(4):29

    Article  MATH  Google Scholar 

  33. Muggleton S (1995) Inverse entailment and Progol. New Gener Comput 13:245–286

    Article  Google Scholar 

  34. Muggleton S, Fen C (1990) Efficient induction of logic programs. In: 1st conference on algorithmic learning theory Tokyo, pp 368–381

  35. Muggleton S, Santos J, Tamaddoni-Nezhad A (2009) ProGolem: a system based on relative minimal generalisation. In: 19th international conference on ILP. Springer, Leuven, pp 131–148

  36. Muzaffar AW, Azam F, Qamar U (2015) A relation extraction framework for biomedical text using hybrid feature set. Comput Math Methods Med 2015:12

  37. Nitesh V, Chawla Kevin W, Bowyer Lawrence OH, Philip KW (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    MATH  Google Scholar 

  38. Patel A, Ramakrishnan G, Bhattacharya P (2010) Incorporating linguistic expertise using ILP for named entity recognition in data hungry Indian languages, LNCS, vol 5989. Springer, Berlin, pp 178–185

    Google Scholar 

  39. Petasis G, Karkaletsis V, Paliouras G, Krithara A, Zavitsanos E (2011) Ontology population and enrichment: state of the art. In: Paliouras G et al (eds) Multimedia information extraction, LNAI, vol 6050, pp 134–166

  40. Plotkin G (1971) A note on inductive generalization. Mach Intell 5(1971):153–163

    MATH  Google Scholar 

  41. Ramakrishnan G, Joshi S, Balakrishnan S, Srinivasan A (2008) Using ILP to construct features for information extraction from semi-structured text. In: Proceedings of the 17th international conference on inductive logic programming, LNAI, vol 4894. Springer, Berlin, pp 211–224

  42. Roth D, Yih W (2004) A Linear programming formulation for global inference in natural language tasks. CoNLL 2004:1–8

    Google Scholar 

  43. Roth D, Yih W (2007) Global inference for entity and relation identification via a linear programming formulation. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge

    Google Scholar 

  44. Santos J (2010) Efficient learning and evaluation of complex concepts in inductive logic programming. Ph.D. thesis, Imperial College University

  45. Seneviratne MD, Ranasinghe DN (2011) Inductive Logic programming in an agent system for ontological relation extraction. Int J Mach Learn Comput 1(4):344–352

    Article  Google Scholar 

  46. Smole D, Ceh M, Podobnikar T (2011) Evaluation of inductive logic programming for information extraction from natural language texts to support spatial data recommendation services. Int J Geogr Inf Sci 25:1809–1827

    Article  Google Scholar 

  47. Srinivasan A, Faruquie T, Joshi S (2012) Data and task parallelism in ILP using MapReduce. J Mach Learn 86–1:141–168

    Article  MathSciNet  MATH  Google Scholar 

  48. Tang J, Hong M, Zhang D, Liang B, Li J (2007) Information extraction: methodologies and applications. Emerging technologies of text mining: techniques and applications. Idea Group Inc., Hershey, pp 1–33

  49. Wimalasuriya DC, Dou D (2010) Components for information extraction: ontology-based information extractors and generic platforms. In: CIKM’10, October 26–30, Toronto, Ontario, Canada

  50. Wimalasuriya DC, Dou D (2009) Ontology-based information extraction: an introduction and a survey of current approaches. J Inform Sci 36(3):306–323

  51. Xia J, Fang, A C, Zhang X (2014) A novel feature selection strategy for enhanced biomedical event extraction using the Turku system. BioMed Res Int 2014:12

  52. Zhou G, Zhang M, Ji D-H, Zhu Q (2007) Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: Joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 728–736

Download references

Acknowledgements

The authors are grateful to Hilário Oliveira for his help in the development of some of the OntoILPER components. We also thank the National Council for Scientific and Technological Development (CNPq/Brazil) for financial support (Grant No. 140791/2010-8).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rinaldo Lima.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lima, R., Espinasse, B. & Freitas, F. OntoILPER: an ontology- and inductive logic programming-based system to extract entities and relations from text. Knowl Inf Syst 56, 223–255 (2018). https://doi.org/10.1007/s10115-017-1108-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1108-3

Keywords