Abstract
The task of Open Relation Extraction from texts faces many challenges, considering the required linguistic knowledge and the sophistication of the language processing techniques employed. This paper presents the extraction and structuring of open relations between named entities from Portuguese texts. We apply the Conditional Random Fields model for the extraction of relation descriptors between named entities belonging to Person, Place and Organisation categories. A 0.64 of F-measure was reached as a result. To make better sense of the output, we structure the extracted relation descriptors using mining configurations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Recognition of Relation between Named Entities.
- 2.
- 3.
- 4.
It is worth noting that preposition-article contraction is split (“da”, “do” changes to “de + a”, “de + o”).
- 5.
References
Abedjan, Z., Naumann, F.: Context and target configurations for mining RDF data. In: Proceedings of the 1st International Workshop on Search and Mining Entity-Relationship Data, SMER 2011, New York, USA, pp. 23–24 (2011)
Abedjan, Z., Naumann, F.: Improving rdf data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)
Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on Portuguese. J. Braz. Comput. Soc. 19(4), 553–571 (2013)
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM Press (2000)
Banko, M., Cafarella, M.J., Soderl, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2670–2676. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: McKeown, K., Moore, J.D., Teufel, S., Allan, J., Furui, S. (eds.) Proceedings of ACL 2008: HLT, pp. 28–36. Association for Computational Linguistics, Columbus (2008)
Brucksen, M., Souza, J.G.C., Vieira, R., Rigo, S.: Sistema serelep para o reconhecimento de relações entre entidades mencionadas. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 14, pp. 247–260. Linguateca (2008)
Cardoso, N.: Rembrandt - reconhecimento de entidades mencionadas baseado em relações e análise detalhada do texto. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 11, pp. 195–211. Linguateca (2008)
Carvalho, P., Oliveira, H.G., Mota, C., Santos, D., Freitas, C.: Segundo harem: modelo geral, novidades e avaliação. In: Mota, C., Santos, D. (eds.) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM (2008)
Chaves, M.S.: Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o sei-geo no segundo harem. In: Mota, C., Santos, D. (eds.) Segundo HAREM, chap. 13, pp. 231–245. Linguateca (2008
Collovini, S., de Bairros Filho, M., Vieira, R.: Analysing the role of representation choices in Portuguese relation extraction. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 105–116. Springer, Switzerland (2015)
Collovini, S., Pugens, L., Vanin, A.A., Vieira, R.: Extraction of relation descriptors for Portuguese using conditional random fields. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 108–119. Springer, Heidelberg (2014)
Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Main Conference on HLT-NAACL, HLT-NAACL 2006, pp. 296–303. Association for Computational Linguistics, Stroudsburg (2006)
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Main Volume, Barcelona, Spain, pp. 423–429 (2004)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of Empirical Methods in Natural Language Processing, EMNLP, pp. 1535–1545 (2011)
Freitag, D., Mccallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 584–589. AAAI Press (2000)
Gamallo, P., Garcia, M.: Multilingual open information extraction. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS, vol. 9273, pp. 711–722. Springer, Heidelberg (2015)
Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp. 10–18. Association for Computational Linguistics, Avignon (2012)
Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL 2004), pp. 415–422. Association for Computational Linguistics, Morristown (2004)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall Series in Artificial Intelligence, 2nd edn. Pearson Education Ltd., London (2009)
Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In: Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, pp. 178–181. Association for Computational Linguistics, Barcelona (2004)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Li, H., Bollegala, D., Matsuo, Y., Ishizuka, M.: Using graph based method to improve bootstrapping relation extraction. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 127–138. Springer, Heidelberg (2011)
Li, Y., Jiang, J., Chieu, H.L., Chai, K.M.A.: Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 392–400. Asian Federation of Natural Language Processing, Chiang Mai (2011)
Mccallum, A.: Efficiently inducing features of conditional random fields. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 403–410. Morgan Kaufmann, San Francisco (2003)
Mota, C., Santos, D., Ranchhod, E.: Avaliação e reconhecimento de entidades mencionadas: princípio do harem. In: Santos, D. (ed.) Avaliação Conjunta: um Novo paradigma no Processamento Computacional da Língua Portuguesa, chap. 14, pp. 161–176. IST Press (2007)
Pires, J.C.B.: Extração e mineração de informação independente de domínios da web na língua Portuguesa. Master’s thesis, Universidade Federal de Goiás, Goiânia (2015)
Santos, A.P., Ramos, C., Marques, N.C.: Extração de Relações em Títulos de Notícias Desportivas. In: INFORUM 2012, Simpósio de Informática, Lisbon, Portugal (2012)
Santos, D., Cardoso, N.: Breve introdução ao HAREM, chap. 1, pp. 1–16. Linguateca (2007)
Santos, V., Pinheiro, V.: Report - um sistema de extração de informações aberta para língua Portuguesa. In: Proceedings of the X Brazilian Symposium in Information and Human Language Technology (STIL). SBC, Natal (2015)
Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 118–127. Association for Computational Linguistics, Stroudsburg (2010)
Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). The Association for Computer Linguistics (2005)
Acknowledgments
We thank the CNPQ, CAPES and FAPERGS for their financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Collovini, S., Machado, G., Vieira, R. (2016). Extracting and Structuring Open Relations from Portuguese Text. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-41552-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)