An Ensemble of LLMs Finetuned with LoRA for NER in Portuguese Legal Documents

Nunes, Rafael Oleques; Puttlitz, Letícia Maria; Boll, Antonio Oss; Spritzer, Andre; Freitas, Carla Maria Dal Sasso; Balreira, Dennis Giovani; Tavares, Anderson Rocha

doi:10.1007/978-3-031-79029-4_9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15412))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

5 Accesses

Abstract

Given the high computational costs of traditional fine-tuning methods and the goal of improving performance,this study investigate the application of low-rank adaptation (LoRA) for fine-tuning BERT models to Portuguese Legal Named Entity Recognition (NER) and the integration of Large Language Models (LLMs) in an ensemble setup. Focusing on the underrepresented Portuguese language, we aim to examine the reliability of extractions enabled by LoRA models and glean actionable insights from the results of both LoRA and LLMs operating in ensembles. Achieving F1-scores of 88.49% for the LeNER-Br corpus and 81.00% for the UlyssesNER-Br corpus, LoRA models demonstrated competitive performance, approaching state-of-the-art standards. Our research demonstrates that incorporating class definitions and counting votes per class substantially improves LLM ensemble results. Overall, this contribution advances the frontiers of AI-powered legal text mining, proposing small models and initial prompt engineering to low-resource conditions that are scalable for broader representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In English: foundation.
2.
https://github.com/TimDettmers/bitsandbytes.

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631 (2019)
Google Scholar
AL-Qurishi, M., AlQaseemi, S., Soussi, R.: Aralegal-BERT: a pretrained language model for Arabic legal text (2022)
Google Scholar
Albuquerque, H.O., et al.: UlyssesNER-Br: a corpus of Brazilian legislative documents for named entity recognition. In: Pinheiro, V., et al. (eds.) PROPOR 2022. LNCS, vol. 13208, pp. 3–14. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98305-5_1
Chapter MATH Google Scholar
Albuquerque, H.O., et al.: On the assessment of deep learning models for named entity recognition of Brazilian legal documents. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds.) Progress in Artificial Intelligence - EPIA 2023. Lecture Notes in Computer Science(), vol. 14116, pp. 93–104. Springer, Cham (2023)
MATH Google Scholar
Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32
Chapter MATH Google Scholar
Bonifacio, L.H., Vilela, P.A., Lobato, G.R., Fernandes, E.R.: A study on the impact of intradomain finetuning of deep language models for legal named entity recognition in Portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020, Part I. LNCS (LNAI), vol. 12319, pp. 648–662. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_46
Chapter MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). https://doi.org/10.1007/BF00058655
Article MATH Google Scholar
Brito, M., et al.: Cdjur-br-uma coleção dourada do judiciário brasileiro com entidades nomeadas refinadas. In: Anais do XIV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pp. 177–186. SBC (2023)
Google Scholar
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: Legal-BERT: the muppets straight out of law school (2020)
Google Scholar
Correia, F.A., et al.: Fine-grained legal entity annotation: a case study on the Brazilian supreme court. Inf. Process. Manag. 59(1), 102794 (2022)
Article MathSciNet MATH Google Scholar
Darji, H., Mitrović, J., Granitzer, M.: German BERT model for legal named entity recognition. In: Proceedings of the 15th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications (2023). https://doi.org/10.5220/0011749400003393
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019)
Google Scholar
Douka, S., Abdine, H., Vazirgiannis, M., Hamdani, R.E., Amariles, D.R.: JuriBERT: a masked-language model adaptation for french legal text (2022)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, vol. 96, pp. 148–156 (1996)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models (2021)
Google Scholar
Jiang, A.Q., et al.: Mistral 7B (2023)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Nakayama, H.: SeqEval: a Python framework for sequence labeling evaluation (2018). https://github.com/chakki-works/seqeval, software available from https://github.com/chakki-works/seqeval
Nunes, R.O., Balreira, D.G., Spritzer, A.S., Freitas, C.M.D.S.: A named entity recognition approach for portuguese legislative texts using self-learning. In: Proceedings of the 16th International Conference on Computational Processing of Portuguese, pp. 290–300 (2024)
Google Scholar
Oleques Nunes., R., Spritzer., A., Dal Sasso Freitas., C., Balreira., D.: Out of sesame street: a study of Portuguese legal named entity recognition through in-context learning. In: Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS, pp. 477–489. INSTICC, SciTePress (2024). https://doi.org/10.5220/0012624700003690
Peters, M.E., et al.: Deep contextualized word representations (2018)
Google Scholar
Polo, F.M., et al.: LegalNLP-natural language processing methods for the Brazilian legal language. In: Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pp. 763–774. SBC (2021)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. OpenAI Technical report (2018)
Google Scholar
Reimers, N., Gurevych, I.: Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2020). https://arxiv.org/abs/2004.09813
Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(4), e1249 (2018)
Article MATH Google Scholar
Salewski, L., Alaniz, S., Rio-Torto, I., Schulz, E., Akata, Z.: In-context impersonation reveals large language models’ strengths and biases. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023). https://openreview.net/forum?id=CbsJ53LdKc
Santos, D., Cardoso, N.: A golden resource for named entity recognition in Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 69–79. Springer, Heidelberg (2006). https://doi.org/10.1007/11751984_8
Chapter MATH Google Scholar
Silva, N., et al.: Evaluating topic models in Portuguese political comments about bills from Brazil’s chamber of deputies. In: Anais da X Brazilian Conference on Intelligent Systems. SBC, Porto Alegre, RS, Brasil (2021). https://sol.sbc.org.br/index.php/bracis/article/view/19061
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese, pp. 403–417 (2020). https://doi.org/10.1007/978-3-030-61377-8_28
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar
Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Google Scholar
Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese. In: Calzolari, N., et al. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022)
Google Scholar
Zanuz, L., Rigo, S.J.: Fostering judiciary applications with new fine-tuned models for legal named entity recognition in Portuguese. In: Pinheiro, V., et al. (eds.) PROPOR 2022. LNCS, vol. 13208, pp. 219–229. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98305-5_21
Chapter MATH Google Scholar

Download references

Acknowledgements

This work has been partially funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. We also acknowledge financial support from the Brazilian funding agency CNPq.

Author information

Authors and Affiliations

Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Rafael Oleques Nunes, Letícia Maria Puttlitz, Antonio Oss Boll, Andre Spritzer, Carla Maria Dal Sasso Freitas, Dennis Giovani Balreira & Anderson Rocha Tavares

Authors

Rafael Oleques Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Letícia Maria Puttlitz
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Oss Boll
View author publications
You can also search for this author in PubMed Google Scholar
Andre Spritzer
View author publications
You can also search for this author in PubMed Google Scholar
Carla Maria Dal Sasso Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Giovani Balreira
View author publications
You can also search for this author in PubMed Google Scholar
Anderson Rocha Tavares
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Oleques Nunes .

Editor information

Editors and Affiliations

Universidade Federal Fluminense, Niterói, Brazil
Aline Paes
Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
Filipe A. N. Verri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nunes, R.O. et al. (2025). An Ensemble of LLMs Finetuned with LoRA for NER in Portuguese Legal Documents. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15412. Springer, Cham. https://doi.org/10.1007/978-3-031-79029-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-79029-4_9
Published: 30 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79028-7
Online ISBN: 978-3-031-79029-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Ensemble of LLMs Finetuned with LoRA for NER in Portuguese Legal Documents