Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • Vicent is an English, Spanish, and Catalan certified sworn translator appointed by the Government of Catalonia and ha... moreedit
  • Dr. Sharon O'Brien, Dr. Benjamin Cowanedit
Recent major changes and technological advances have consolidated machine translation (MT) as a key player to be considered in the language services industry (translation, localization, internationalization, etc.). In many cases, it is... more
Recent major changes and technological advances have consolidated machine translation (MT) as a key player to be considered in the language services industry (translation, localization, internationalization, etc.). In many cases, it is even an essential player due to budget and time constraints (Martín-Mor et al., 2016).

In recent years much attention has been paid to MT research, and MT use by professional or amateur users has increased (Guerberof & Moorkens, 2019). However, the best-known and most widely used MT platforms are commercial, and the privacy of data entered may be at stake (Moorkens & Lewis, 2019), as it is not known whether the program is respecting the confidential information users have on their computers.
In addition, new MT systems, migrating from Statistical Machine Translation (SMT) to Neural Machine Translation (NMT) paradigms, require more and more computational power (GPUs) and larger amounts of bilingual text —or corpora— to work efficiently (Forcada, 2017). This also complicates creating high-quality MT engines that can be an interesting alternative to the ones of big technological companies, increasing the reliance on commercial platforms.

Additionally, these new requirements also make it difficult to create free/open-source options that respect users' privacy, even for languages with large amounts of available corpora. Much more complex is the case of minoritized or stateless languages, such as Catalan. Thus, when there is a free/open-source MT engine available, users are normally obliged to choose between quality (commercial) or data privacy (free/open-source).

Softcatalà (a non-profit organization whose aim is to promote the use of Catalan in computing, the Internet and new technologies) has created a new free/open-source NMT engine: Softcatalà’s Translator. This study analyses this new MT engine and compares it with Apertium (its previous free/open-source MT engine; rule-based machine translation) and Google Translator (the flagship of commercial MT) in the English-Catalan combination.

Although MT engine developers use automatic metrics for MT engine evaluation, human evaluation remains the gold standard despite its cost (Läubli et al., 2020). Using TAUS DQF tools (O'Brien, 2012; Görög, 2014), quality (in terms of relative ranking, accuracy and fluency) and productivity (comparing editing times and distances) have been evaluated with the participation of 11 evaluators. Results show that Softcatalà's Translator offers superior quality and productivity to its competitors. Thus, a different and ethical MT is possible. Users can achieve high-quality translations while ensuring their data privacy.
This study evaluates the machine translation (MT) quality of two state-of-the-art large language models (LLMs) against a traditional neural machine translation (NMT) system across four language pairs in the legal domain. It combines... more
This study evaluates the machine translation (MT) quality of two state-of-the-art large language models (LLMs) against a traditional neural machine translation (NMT) system across four language pairs in the legal domain. It combines automatic evaluation metrics (AEMs) and human evaluation (HE) by professional translators to assess translation ranking, fluency and adequacy. The results indicate that while Google Translate generally outperforms LLMs in AEMs, human evaluators rate LLMs, especially GPT-4, comparably or slightly better in terms of producing contextually adequate and fluent translations. This discrepancy suggests LLMs' potential in handling specialized legal terminology and context, highlighting
We present the MEDDOPLACE task, the first initiative addressing the automatic detection and normalization of all location-relevant entity types present in clinical texts. The resources resulting from MEDDOPLACE can be directly useful to... more
We present the MEDDOPLACE task, the first initiative addressing the automatic detection and normalization of all location-relevant entity types present in clinical texts. The resources resulting from MEDDOPLACE can be directly useful to characterize location information of importance for disease outbreak monitoring, diagnosis and prognosis, improving patient care and safety, analyze patient movements, mobility, and travels, among many other health-related applications. MED-DOPLACE relied on a high quality manually annotated corpus of 1000 clinical cases in Spanish, together with location mention normalization (mapping to GeoNames, PlusCodes and SNOMED-CT concepts), as well as a Silver Standard dataset in multiple languages (including English, Italian, Portuguese, Dutch or Swedish). The results obtained by participating teams, as well as the generated resources show a clear practical potential to improve location analysis for health-care data processing.
Perceptions and experiences of machine translation (MT) users before, during, and after their interaction with MT systems, products or services has been overlooked both in academia and in industry. Traditionally, the focus has been on... more
Perceptions and experiences of machine translation (MT) users before, during, and after their interaction with MT systems, products or services has been overlooked both in academia and in industry. Traditionally, the focus has been on productivity and quality, often neglecting the human factor. We propose the concept of Machine Translation User Experience (MTUX) for assessing, evaluating, and getting further information about the user experiences of people interacting with MT. By conducting a human-computer interaction (HCI)-based study with 15 professional translators, we present a methodological paper in which we analyse which is the best method for measuring MTUX, and conclude by suggesting the use of the User Experience Questionnaire (UEQ). The measurement of MTUX will help every stakeholder in the MT industry-developers will be able to identify pain points for the users and solve them in the development process, resulting in better MTUX and higher adoption of MT systems or products by MT users.
This paper presents a user study with 15 professional translators in the English-Spanish combination. We present the concept of Machine Translation User Experience (MTUX) and compare the effects of traditional post-editing (TPE) and... more
This paper presents a user study with 15 professional translators in the English-Spanish combination. We present the concept of Machine Translation User Experience (MTUX) and compare the effects of traditional post-editing (TPE) and interactive post-editing (IPE) on MTUX, translation quality and productivity. Results suggest that translators prefer IPE to TPE because they are in control of the interaction in this new form of translator-computer interaction and feel more empowered in their interaction with Machine Translation. Productivity results also suggest that IPE may be an interesting alternative to TPE, given the fact that translators worked faster in IPE even though they had no experience in this new machine translation post-editing modality, but were already used to TPE.
Recent major changes and technological advances have consolidated machine translation (MT) as a key player to be considered in the language services world. In numerous instances, it is even an essential player due to budget and time... more
Recent major changes and technological advances have consolidated machine translation (MT) as a key player to be considered in the language services world. In numerous instances, it is even an essential player due to budget and time constraints. Much attention has been paid to MT research recently, and MT use by professional or amateur users has increased. Yet, research has focused mainly on language combinations with huge amounts of online available corpora (e.g. English-Spanish). The situation for minoritized or stateless languages like Catalan is different. This study analyses Softcatalà’s new open-source, neural machine translation engine and compares it with Google Translate and Apertium in the English-Catalan language pair. Although MT engine developers use automatic metrics for MT engine evaluation, human evaluation remains the gold standard, despite its cost. Using TAUS DQF tools, translation quality (in terms of relative ranking, adequacy and fluency) and productivity (comp...
Treball Final de Grau en Traducció i Interpretació. Codi: TI0983. Curs: 2018/2019En el contexto actual de avances tecnológicos y desarrollo de la inteligencia artificial (IA), la calidad de la traducción automática (TA) ha mejorado... more
Treball Final de Grau en Traducció i Interpretació. Codi: TI0983. Curs: 2018/2019En el contexto actual de avances tecnológicos y desarrollo de la inteligencia artificial (IA), la calidad de la traducción automática (TA) ha mejorado sustancialmente, ganando terreno en el mundo de la traducción, tanto el profesional como el académico. La TA, especialmente en el mundo académico, parece estar a punto de suponer una sólida amenaza para los planes de estudio tradicionales de traducción, cuya atención a la tecnología podría no ser un reflejo exacto de la forma en que la industria de la traducción está implementando estos avances relacionados con la tecnología. En este estudio se prestará especial atención a la calidad de la traducción. Se compararán tres traducciones de alumnas del último curso del grado en Traducción e Interpretación con una traducción automática generada por DeepL, mediante una métrica de evaluación de referencia en el contexto de la traducción jurídica. Los resultados a...
The <strong>MEDDOPROF corpus</strong> is a collection of 2000 clinical cases from over 20 different specialties annotated with professions, employment statuses and other work-related activities. It is used for the MEDDOPROF... more
The <strong>MEDDOPROF corpus</strong> is a collection of 2000 clinical cases from over 20 different specialties annotated with professions, employment statuses and other work-related activities. It is used for the MEDDOPROF Shared Task on occupations and employment status detection and normalization in Spanish medical documents, which will be celebrated as part of IberLEF 2021. The sample set is composed of <strong>15 clinical cases</strong> extracted from the training set from four different specialties: radiology, oncology, psychiatry and occupational health. The files are distributed as follows: - For the <strong>subtask 1 (MEDDOPROF-NER)</strong>, annotations are distributed in Brat standoff format with PROFESION/SITUACION_LABORAL tags only. - For the<strong> subtask 2 (MEDDOPROF-CLASS)</strong>, annotations are distributed in Brat standoff format with PACIENTE/FAMILIAR/SANITARIO/OTROS tags only. - For the <strong>subtask 3 (...
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical... more
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen's associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. MEDDOPROF has three different sub-tasks: <strong>1) MEDDOPROF-NER</strong>: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION), SITUACION_LABORAL (WORKING_STATUS) or ACTIVIDAD (ACTIVIDAD). <strong>2) MEDDOPROF-CLASS: </strong>Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or O...
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical... more
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen's associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. This is the reference coding list to be used for Task 3 (MEDDOPROF-NORM). It is a .tsv file that has three columns: code, label and alternative label. Codes from two sources are listed: ESCO and SNOMED-CT (these are preceded by the string 'SCTID:' in the list). With a few exceptions, professions are mapped to ESCO, while working statuses and activities are mapped to SNOMED-CT. MEDDOPROF is part of the IberLEF 2021 workshop, which is co-located with the SEPLN 2021 conference. For further informat...
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical... more
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen's associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. MEDDOPROF has three different sub-tasks: <strong>1) MEDDOPROF-NER</strong>: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION), SITUACION_LABORAL (WORKING_STATUS) or ACTIVIDAD (ACTIVIDAD). <strong>2) MEDDOPROF-CLASS: </strong>Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or O...
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical... more
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen's associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. MEDDOPROF has three different sub-tasks: <strong>1) MEDDOPROF-NER</strong>: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION), SITUACION_LABORAL (WORKING_STATUS) or ACTIVIDAD (ACTIVIDAD). <strong>2) MEDDOPROF-CLASS: </strong>Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or O...
SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. The ProfNER Shared Task encourages its participants to detect occupations and employment situations in Spanish tweets related to the COVID-19... more
SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. The ProfNER Shared Task encourages its participants to detect occupations and employment situations in Spanish tweets related to the COVID-19 situation. These guidelines describe the process followed by the clinical and linguist experts who manually annotated the ProfNER corpus. English version of ProfNER annotation guidelines. For further information, please visit https://temu.bsc.es/smm4h-spanish/ or email us at encargo-pln-life@bsc.es Resources: <strong>Web</strong> <strong><strong>Gold Standard corpus</strong></strong> <strong>Annnotation Guidelines (in Spanish)</strong>
This document is a companion to the article "NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts"... more
This document is a companion to the article "NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts" which includes more information on the Gold Standard corpus' content and the results of all submitted runs.
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical... more
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen's associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. These guidelines describe the process followed by the clinical and linguist experts who manually annotated the MEDDOPROF corpus, and a series of rules for annotating occupations in clinical texts. <strong>Annotation quality:</strong> We have performed a <strong>consistency analysis</strong> of the corpus. ~20% of the documents have been annotated by an internal annotator as well as by the linguist experts following these annotation guidelines. The average Inter-Annotator Agreemen...
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical... more
The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen's associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. MEDDOPROF has three different sub-tasks: <strong>1) MEDDOPROF-NER</strong>: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION) or SITUACION_LABORAL (WORKING_STATUS) <strong>2) MEDDOPROF-CLASS: </strong>Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or OTRO [other]). <strong...
Among the socio-demographic patient characteristics, occupations play an important role regarding not only occupational health, work-related accidents and exposure to toxic/pathogenic agents, but also their impact on general physical and... more
Among the socio-demographic patient characteristics, occupations play an important role regarding not only occupational health, work-related accidents and exposure to toxic/pathogenic agents, but also their impact on general physical and mental health. This paper presents the Medical Documents Profession Recogni-tion (MEDDOPROF) shared task (held within IberLEF/SEPLN 2021), focused on the recognition and normalization of occupations in medical documents in Spanish. MEDDOPROF proposes three challenges: NER (recognition of professions, employ-ment statuses and activities in text), CLASS (classifying each occupation mention to its holder, i.e. patient or family member) and NORM (normalizing mentions to their identifier in ESCO or SNOMED CT). From the total of 40 registered teams, 15 submitted a total of 94 runs for the various sub-tracks. Best-performing systems were based on deep-learning technologies (incl. transformers) and achieved 0.818 F-score in occupation detection (NER), 0.793 ...
The digitalisation of our society has progressed rapidly over the last decade and many technological improvements have appeared before us, affecting the way we relate with each other, work and live. Without exception, the language... more
The digitalisation of our society has progressed rapidly over the last decade and many technological improvements have appeared before us, affecting the way we relate with each other, work and live. Without exception, the language services world has been substantially impacted by these technological advancements. In recent years, the great adoption of new technologies has resulted in researchers focusing their studies on very different topics, which changed swiftly. Initially, there was a change from translation memories and computerassisted translation (cat) tools to rule-based machine translation (Forcada et al. 2011). Soon after, with the increase in computer processing power and the emergence of large amounts of corpora available on the Internet, statistical machine translation appeared, which offered better automatic translations than the previous rule-based paradigm (Koehn 2010). In the last years, however, machine translation research has focused on a new paradigm: neural machine translation (Bentivogli et al., 2016). Currently, a new system is beginning to emerge — interactive machine translation —, and early studies indicate that it can overcome the previous paradigm, as well as offer a number of additional advantages. These technologies are now intrinsic to the translation profession, and the Translation Studies field cannot be analysed without considering and knowing state-of-the-art technologies. Thus, this study undertakes a literature review of the translation technologies that have been used these last years, and focuses mainly on describing current echnologies and the path that they may open up in the near future to help researchers grasp an idea of what the language services world is right now in terms of technological advancements.
This paper analyses the role of translators in a rapidly changing industry, which is strongly marked by digitalization and automation, and suggests the skills and competences translators will need to embrace to succeed in some branches of... more
This paper analyses the role of translators in a rapidly changing industry, which is strongly marked by digitalization and automation, and suggests the skills and competences translators will need to embrace to succeed in some branches of the industry of the (not-so distant) future. This research is based on an in-depth review of language-related positions in the current job market, as well as on recent survey-based research that sought to understand what roles translators are taking up currently, supported by web scraping of LinkedIn job data with Python. In today’s globalised world, language-related services imply and encompass many more positions than translation alone. Different areas of specialisation are proposed, which may lead to successful and sustainable language-related positions in the age of automation by following and implementing the trends of the industry. We analyse industry trends and skills and competences that will play an important role in the (not-so distant) future job market, and suggest a new, highly-technologised, tech-symbiotic role: the language engineer. Language engineers are people with the required profiles to succeed in the automation age and will be able to commit to the multiple new, transversal language-related positions that appear as a result of recent technological developments. Numerous studies have drawn upon the “benefits” of technology in terms of productivity increase. Current market trends have also resulted in studies questioning the sustainability of the translation profession. Undoubtedly, technology changes our lives, but it’s up to us whether it does so for good or bad. In our relation with technology, we can resist, cooperate or reinvent ourselves. We consider that defending a Luddite position (resistance to technology) will only bring negative consequences for the profession. Therefore, we suggest the role of language engineers, who will not only cooperate with and benefit from technology but will also see their skills and competences augmented to meet industry requirements and be up to date with technological advancements.
Recent major changes and technological advances have consolidated machine translation (MT) as a key player to be considered in the language services world. In numerous instances, it is even an essential player due to budget and time... more
Recent major changes and technological advances have consolidated machine translation (MT) as a key player to be considered in the language
services world. In numerous instances, it is even an essential player due to budget and time constraints. Much attention has been paid to MT research recently, and MT use by professional or amateur users has increased. Yet, research has focused mainly on language combinations with huge amounts of online available corpora (e.g. English-Spanish). The situation for minoritized or stateless languages like Catalan is different. This study analyses Softcatalà’s new open-source, neural machine translation engine and compares it with Google Translate and Apertium in the English-Catalan language pair. Although MT engine developers use automatic metrics for MT engine evaluation, human evaluation remains the gold standard, despite its cost. Using TAUS DQF tools, translation quality (in terms of relative ranking, adequacy and fluency) and productivity (comparing editing times and distances) have been evaluated
with the participation of 11 evaluators. Results show that Softcatalà's Translator offers higher quality and productivity than the other engines
analysed.
The ProfNER Shared Task encourages its participants to detect occupations and employment situations in Spanish tweets related to the COVID-19 situation. These guidelines describe the process followed by the clinical and linguist experts... more
The ProfNER Shared Task encourages its participants to detect occupations and employment situations in Spanish tweets related to the COVID-19 situation. These guidelines describe the process followed by the clinical and linguist experts who manually annotated the ProfNER corpus. Currently, the guidelines are only in Spanish. An English version will be available soon. For further information, please visit https://temu.bsc.es/smm4h-spanish/ or email us at encargo-pln-life@bsc.es
En el contexto actual de avances tecnológicos y desarrollo de la inteligencia artificial, la digitaliza­ción de las sociedades y las mejoras tecnológicas transforman nuestras vidas en todos los ámbitos. La traducción no es una excepción.... more
En el contexto actual de avances tecnológicos y desarrollo de la inteligencia artificial, la digitaliza­ción de las sociedades y las mejoras tecnológicas transforman nuestras vidas en todos los ámbitos. La traducción no es una excepción. Con la aparición de la traducción automática neuronal —un nuevo paradigma de traducción automática—, la calidad que ofrece dicho sistema ha mejorado sus­tancialmente, incluso llegando a afirmarse que iguala o supera la calidad de la traducción humana en determinados ámbitos como las noticias. No obstante, los lenguajes de especialidad entrañan complejidades intrínsecas. En traducción jurídica, el anisomorfismo del lenguaje jurídico puede ser una brecha muy difícil de salvar para las máquinas: términos dispares para un mismo concepto en sistemas jurídicos diferentes, equivalencia cero o parcial, etc. Así, el objetivo del presente trabajo es estudiar la utilidad de la traducción automática como recurso formativo en el aula de traducción jurídica, teni...
Gold Standard annotations for SMM4H-Spanish shared task. SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. <br> <strong>Introduction:</strong><br> The entire corpus contains... more
Gold Standard annotations for SMM4H-Spanish shared task. SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. <br> <strong>Introduction:</strong><br> The entire corpus contains 10,000 annotated tweets. It has been split into training, validation and test (60-20-20). The current version contains the training and development set of the shared task with Gold Standard annotations.<br> In future versions of the dataset, test and background sets will be released. Annotations are distributed in 2 formats: Brat standoff and TSV. See Brat webpage for more information about Brat standoff format (https://brat.nlplab.org/standoff.html)<br> The TSV format follows the format employed in SMM4H 2019 Task 2:<br> Tweet ID Begin End Class text In addition, we provide a tokenized version of the dataset, for participant's convenience. It follows the BIO format (similar to CONLL). The files were generated with the brat_to_c...
Among the socio-demographic patient characteristics, occupations play an important role regarding not only occupational health, work-related accidents and exposure to toxic/pathogenic agents, but also their impact on general physical and... more
Among the socio-demographic patient characteristics, occupations play
an important role regarding not only occupational health, work-related accidents and exposure to toxic/pathogenic agents, but also their impact on general physical and mental health. This paper presents the Medical Documents Profession Recognition (MEDDOPROF) shared task (held within IberLEF/SEPLN 2021), focused on the recognition and normalization of occupations in medical documents in Spanish.
MEDDOPROF proposes three challenges: NER (recognition of professions, employment statuses and activities in text), CLASS (classifying each occupation mention to its holder, i.e. patient or family member) and NORM (normalizing mentions to their identifier in ESCO or SNOMED CT). From the total of 40 registered teams, 15 submitted a total of 94 runs for the various sub-tracks. Best-performing systems
were based on deep-learning technologies (incl. transformers) and achieved 0.818 F-score in occupation detection (NER), 0.793 in classifying occupations to their referent (CLASS) and 0.619 in normalization (NORM). Future initiatives should also address multilingual aspects and application to other domains like social services, human resources, legal or job market data analytics and policymakers.
In the current context of technological advancements and development of artificial intelligence, the digitalization of our societies and technological improvements are transforming our lives in all areas. Translation is no exception. With... more
In the current context of technological advancements and development of artificial intelligence, the digitalization of our societies and technological improvements are transforming our lives in all areas. Translation is no exception. With the emergence of neural machine translation (nmt), a new paradigm of machine translation (mt), the quality offered by such systems has improved substantially, and some authors even claim that mt systems equal or surpass human translation quality in certain fields such as news. However, specialized genres involve intrinsic complexities. In legal translation, the anisomorphism of legal language can be a very difficult gap for machines to bridge: different terms for the same concept in different legal systems, zero or partial equivalence, and so on. In this study, a human evaluation of three human translations of English-Spanish company contracts and one translation generated by a nmt engine will be carried out. Results show that mt could be a very useful teaching tool in the legal translation classroom, allowing to identify the skills that could be enhanced by such an approach. Finally, how mt could be incorporated into the training of legal translators is proposed, and the advantages it would have over traditional teaching-learning methods are presented.
Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and... more
Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and heterogeneous data types that mention occupations, text mining efforts to recognize them have been limited. This is due to the lack of clear annotation guidelines and high-quality Gold Standard corpora. Social media data can be regarded as a relevant source of information for real-time monitoring of at risk occupational groups in the context of pandemics like the COVID-19 one, facilitating intervention strategies for occupations in direct contact with infectious agents or affected by mental health issues. To evaluate current NLP methods and to generate resources, we have organized the ProfNER track at SMM4H 2021, providing ProfNER participants with a Gold Standard corpus of manually annotated tweets (human IAA of 0.919) following annotation guidelines available in Spanish and English, an occupation gazetteer, a machinetranslated version of tweets, and FastText embeddings. Out of 35 registered teams, 11 submitted a total of 27 runs. Best-performing participants built systems based on recent
NLP technologies (e.g. transformers) and achieved 0.93 F-score in Text Classification and 0.839 in Named Entity Recognition. Corpus: https://doi.org/10.5281/zenodo.4309356
Este informe tiene como objetivo dar a conocer qué son las tecnologías del habla, su evolución histórica, su funcionamiento y su estado actual de desarrollo. También haremos especial énfasis en su uso y adopción en el mundo de la... more
Este informe tiene como objetivo dar a conocer qué son las tecnologías del habla, su evolución histórica, su funcionamiento y su estado actual de desarrollo. También haremos especial énfasis en su uso y adopción en el mundo de la lingüística, con especial relevancia para las personas que estudian filología, traducción o lingüística aplicada.
Este informe ofrece una introducción al mundo de la inteligencia artificial (IA) desde un punto de vista que no es técnico, y pone énfasis en la definición de conceptos clave con el objetivo de asentar las bases de qué es la IA para... more
Este informe ofrece una introducción al mundo de la inteligencia artificial (IA) desde un punto de vista que no es técnico, y pone énfasis en la definición de conceptos clave con el objetivo de asentar las bases de qué es la IA para facilitar la comprensión del término, que ha ganado una gran popularidad recientemente en todos los campos de nuestra sociedad. Después de leer este informe, se tendrá una mejor comprensión sobre qué es la IA, cómo funciona y cómo está cambiando el mundo que nos rodea.