Abstract
This paper presents the evaluation of NERP-CRF, a Conditional Random Fields (CRF) based tool for Portuguese Named Entities Recognition (NER) against other publicly available NER tools. The presented evaluation is based on the comparison with three other NER tools for Portuguese. The comparison is made observing Recall and Precision measures obtained by each tool over the HAREM corpus, a golden standard for NER for Portuguese texts. The experiments were initially conducted considering ten categories and then, considering a reduced number of categories. The results show that NERP CRF outperforms the others tools when sufficiently trained for four entity categories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jiang, J.: Information extraction from text. In: Mining Text Data, ch. 2, pp. 11–41. Springer, New York (2012)
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pp. 104–107 (2004)
Suakkaphong, N., Zhang, Z., Chen, H.: Disease Named Entity Recognition Using Semisupervised Learning and Conditional Random Fields. Journal of the American Society for Information Science and Technology, 727–737 (2011)
Batista, S., Silva, J., Couto, F., Behera, B.: Geographic Signatures for Semantic Retrieval. In: 6th Workshop on Geographic Information Retrieval, pp. 18–19. ACM (2010)
Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese. In: 7th International Conference on Language Resources and Evaluation, pp. 363–3637. LREC. European Language Resources Association. ELRA, Valletta (2010)
Amaral, D.O.F.: Reconhecimento de entidades nomeadas por meio de conditional random fields para a lÃngua portuguesa. M.sc. dissertation, PUCRS, Porto Alegre, Brazil (2012)
Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: 7th International Conference on Language Resources and Evaluation, LREC, pp. 3485–3490 (2010)
LTasks – Language Tasks, http://ltasks.com
Bick, E.: Functional aspects in portuguese NER. In: Vieira, R., Quaresma, P., das Nunes, M.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS, vol. 3960, pp. 80–89. Springer, Heidelberg (2006)
Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e atas do HAREM, a primeira avaliação conjunta na área. In: Santos, D., Cardoso, N. (eds.) ch. 1, pp. 1–16 (2008)
Santos, D.: Caminhos percorridos no mapa da portuguesificação: A linguateca em perspectiva. Linguateca 1, 25–59 (2009)
Carvalho, P., Oliveira, H.G., Mota, C., Santos, D., Freitas, C.: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. In: Mota, C., Santos, D. (eds.) Linguateca, ch. 1, pp. 11–31 (2008)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: 18th International Conference on Machine Learning ICML, pp. 282–289 (2001)
Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e atas do HAREM, a primeira avaliação conjunta na área, ch. 20, pp. 307–326 (2007)
Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: 13th Conference on Computational Natural Language Learning, CONLL, pp. 147–155 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
do Amaral, D.O.F., Fonseca, E., Lopes, L., Vieira, R. (2014). Comparing NERP-CRF with Publicly Available Portuguese Named Entities Recognition Tools. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)