RigoBERTa: A State-of-the-Art Language Model For Spanish
Authors:
Alejandro Vaca Serrano,
Guillem Garcia Subies,
Helena Montoro Zamorano,
Nuria Aldama Garcia,
Doaa Samy,
David Betancur Sanchez,
Antonio Moreno Sandoval,
Marta Guerrero Nieto,
Alvaro Barbero Jimenez
Abstract:
This paper presents RigoBERTa, a State-of-the-Art Language Model for Spanish. RigoBERTa is trained over a well-curated corpus formed up from different subcorpora with key features. It follows the DeBERTa architecture, which has several advantages over other architectures of similar size as BERT or RoBERTa. RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spani…
▽ More
This paper presents RigoBERTa, a State-of-the-Art Language Model for Spanish. RigoBERTa is trained over a well-curated corpus formed up from different subcorpora with key features. It follows the DeBERTa architecture, which has several advantages over other architectures of similar size as BERT or RoBERTa. RigoBERTa performance is assessed over 13 NLU tasks in comparison with other available Spanish language models, namely, MarIA, BERTIN and BETO. RigoBERTa outperformed the three models in 10 out of the 13 tasks, achieving new "State-of-the-Art" results.
△ Less
Submitted 3 June, 2022; v1 submitted 27 April, 2022;
originally announced May 2022.
Jabalin: a Comprehensive Computational Model of Modern Standard Arabic Verbal Morphology Based on Traditional Arabic Prosody
Authors:
Alicia Gonzalez Martinez,
Susana Lopez Hervas,
Doaa Samy,
Carlos G. Arques,
Antonio Moreno Sandoval
Abstract:
The computational handling of Modern Standard Arabic is a challenge in the field of natural language processing due to its highly rich morphology. However, several authors have pointed out that the Arabic morphological system is in fact extremely regular. The existing Arabic morphological analyzers have exploited this regularity to variable extent, yet we believe there is still some scope for impr…
▽ More
The computational handling of Modern Standard Arabic is a challenge in the field of natural language processing due to its highly rich morphology. However, several authors have pointed out that the Arabic morphological system is in fact extremely regular. The existing Arabic morphological analyzers have exploited this regularity to variable extent, yet we believe there is still some scope for improvement. Taking inspiration in traditional Arabic prosody, we have designed and implemented a compact and simple morphological system which in our opinion takes further advantage of the regularities encountered in the Arabic morphological system. The output of the system is a large-scale lexicon of inflected forms that has subsequently been used to create an Online Interface for a morphological analyzer of Arabic verbs. The Jabalin Online Interface is available at http://elvira.lllf.uam.es/jabalin/, hosted at the LLI-UAM lab. The generation system is also available under a GNU GPL 3 license.
△ Less
Submitted 29 June, 2014;
originally announced June 2014.