The paper represents a brief description of our system as one of the solutions to the problem of ... more The paper represents a brief description of our system as one of the solutions to the problem of global topological localization for indoor environments. The experiment involves analyzing images acquired with a perspective camera mounted on a robot platform and applying a feature-based method (SIFT) and two main systems in order to search and classify the given images. To obtain acceptable results and improved performance improvement, the algorithm acquires two main maturity levels: one capable of running in ...
This paper describes the preliminary results of a system for extracting sentiments opinioned with... more This paper describes the preliminary results of a system for extracting sentiments opinioned with regard with named entities. It also combines rule-based classification, statistics and machine learning in a new method. The accuracy and speed of extraction and classification are crucial. The service oriented architecture permits the end-user to work with a flexible interface in order to produce applications that range from aggregating consumer feedback on commercial products to measuring public opinion on political issues from ...
Digital libraries have a key role in cultural heritage as they provide access to our culture and ... more Digital libraries have a key role in cultural heritage as they provide access to our culture and history by indexing books and historical documents (newspapers and letters). Digital libraries use natural language processing (NLP) tools to process these documents and enrich them with meta-information, such as named entities. Despite recent advances in these NLP models, most of them are built for specific languages and contemporary documents that are not optimized for handling historical material that may for instance contain language variations and optical character recognition (OCR) errors. In this work, we focused on the entity linking (EL) task that is fundamental to the indexation of documents in digital libraries. We developed a Multilingual Entity Linking architecture for HIstorical preSS Articles that is composed of multilingual analysis, OCR correction, and filter analysis to alleviate the impact of historical documents in the EL task. The source code is publicly available. E...
In this paper we describe a system that participated in the fourth benchmarking activity ImageCLE... more In this paper we describe a system that participated in the fourth benchmarking activity ImageCLEF, in the Robot Vision task, for which we approach the task of topological localization without using a temporal continuity of the sequences of images. We provide details for the state-of-the-art methods that were selected: Color Histograms, SIFT (Scale Invariant Feature Transform), ASIFT (Ane SIFT) and RGB- SIFT, Bag-of-Visual-Words strategy inspired from the text retrieval com- munity. We focused on nding the optimal set of features and a deepened analysis was carried out. We oer an analysis of the dierent features, similarity measures and a performance evaluation of combinations of the proposed methods for topological localization. Also, we detail a genetic algorithm that was used for eliminating the false positives results. In the end, we draw several conclusions targeting the advantages of using proper congurations of visual-based appearance descriptors, similarity measures and clas...
Information Extraction systems must cope with two problems : they heavily depend on the considere... more Information Extraction systems must cope with two problems : they heavily depend on the considered domain but the cost of development for a domain-specific system is important. We propose a new solution for role la- beling in the event-extraction task that relies on using unsupervised word representations (word embeddings) as word features. We automatically learn domain-relevant distributed representations from a domain-specific unlabeled corpus without complex linguistic processing and use these features in a supervised classifier. Our experimental results on the MUC-4 corpus show that this system outperforms state-of-the-art systems on this event extraction task, especially when the amount of annotated data is small.We also show that using word representations induced on a domain-relevant dataset achieves better results than using more general word embeddings.
In this paper, we propose a recent and underresearched paradigm for the task of event detection (... more In this paper, we propose a recent and underresearched paradigm for the task of event detection (ED) by casting it as a questionanswering (QA) problem with the possibility of multiple answers and the support of entities. The extraction of event triggers is, thus, transformed into the task of identifying answer spans from a context, while also focusing on the surrounding entities. The architecture is based on a pre-trained and fine-tuned language model, where the input context is augmented with entities marked at different levels, their positions, their types, and, finally, the argument roles. Experiments on the ACE 2005 corpus demonstrate that the proposed paradigm is a viable solution for the ED task and it significantly outperforms the state-of-the-art models. Moreover, we prove that our methods are also able to extract unseen event types.
Cet article aborde la tâche de détection d’événements, visant à identifier et catégoriser les men... more Cet article aborde la tâche de détection d’événements, visant à identifier et catégoriser les mentions d’événements dans les textes. Une des difficultés de cette tâche est le problème des mentions d’événements correspondant à des mots mal orthographiés, très spécifiques ou hors vocabulaire. Pour analyser l’impact de leur prise en compte par le biais de modèles de caractères, nous proposons d’intégrer des plongements de caractères, qui peuvent capturer des informations morphologiques et de forme sur les mots, à un modèle convolutif pour la détection d’événements. Plus précisément, nous évaluons deux stratégies pour réaliser une telle intégration et montrons qu’une approche de fusion tardive surpasse à la fois une approche de fusion précoce et des modèles intégrant des informations sur les caractères ou les sous-mots tels que ELMo ou BERT.
This paper summarizes the participation of the Laboratoire Informatique, Image et Interaction (L3... more This paper summarizes the participation of the Laboratoire Informatique, Image et Interaction (L3i laboratory) of the University of La Rochelle in the Recognizing Ultra Finegrained Entities (RUFES) track1 within the Text Analysis Conference (TAC) series of evaluation workshops. Our participation relies on two neural-based models, one based on a pretrained and fine-tuned language model with a stack of Transformer layers for fine-grained entity extraction and one out-of-the-box model for within-document entity coreference. We observe that our approach has great potential in increasing the performance of fine-grained entity recognition. Thus, the future work envisioned is to enhance the ability of the models following additional experiments and a deeper analysis of the results.
This paper presents the TLR participation in the FinNum-2 task. Our system is based on a Transfor... more This paper presents the TLR participation in the FinNum-2 task. Our system is based on a Transformer architecture improved by a pre-processing strategy for numeral attachment identification. Instead of relying on a vanilla attention mechanism, we focus the attention to specific tokens that are essential for the task. The results in an unseen test collection show that our model correctly generalises the predictions as our best run outperforms all those of other participants in terms of F1-macro (official metric). Further, results show the robustness of our method as well as the experiments with two alternatives (with and without parameter tuning) leading to an additional improvement of 4% over our best run. ACM Reference Format: Jose G. Moreno, Emanuela Boros, and Antoine Doucet. 2018. TLR at the NTCIR-15 FinNum-2 Task: Improving Text Classifiers for Numeral Attachment in Financial Social Data. In Woodstock ’18: ACM Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NY ...
In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Lea... more In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Learning Semantic Similarities for the Financial domain. The main focus of this task is to evaluate the classification of financial terms into corresponding top-level concepts (also known as hypernyms) that were extracted from an external ontology. We approached the task as a semantic textual similarity problem. By relying on a siamese network with pre-trained language model encoders, we derived semantically meaningful term embeddings and computed similarity scores between them in a ranked manner. Additionally, we exhibit the results of different baselines in which the task is tackled as a multi-class classification problem. The proposed methods outperformed our baselines and proved the robustness of the models based on textual similarity siamese network.
This paper describes UAIC’s Question Answering for Machine Reading Evaluation systems participati... more This paper describes UAIC’s Question Answering for Machine Reading Evaluation systems participating in the QA4MRE 2012 evaluation task. We submitted two types of runs, first type of runs based on our system from 2011 edition of QA4MRE, and second type of runs based on Textual Entailment system. For second types of runs, we construct the Text and the Hypothesis, asked by Textual Entailment system from initial test data (the tag was used to build the Text and the and tags were used to build the Hypothesis). The results offered by organizer showed that second type of runs were better than first type of runs for English.
Du point de vue du traitement automatique des langues (TAL), l’extraction des evenements dans les... more Du point de vue du traitement automatique des langues (TAL), l’extraction des evenements dans les textes est la forme la plus complexe des processus d’extraction d’information, qui recouvrent de facon plus generale l’extraction des entites nommees et des relations qui les lient dans les textes. Le cas des evenements est particulierement ardu car un evenement peut etre assimile a une relation n-aire ou a une configuration de relations. Alors que la recherche en extraction d’information a largement beneficie des jeux de donnees etiquetes manuellement pour apprendre des modeles permettant l’analyse des textes, la disponibilite de ces ressources reste un probleme important. En outre, de nombreuses approches en extraction d’information fondees sur l’apprentissage automatique reposent sur la possibilite d’extraire a partir des textes de larges en sembles de traits definis manuellement grâce a des outils de TAL elabores. De ce fait, l’adaptation a un nouveau domaine constitue un defi suppl...
The paper represents a brief description of our system as one of the solutions to the problem of ... more The paper represents a brief description of our system as one of the solutions to the problem of global topological localization for indoor environments. The experiment involves analyzing images acquired with a perspective camera mounted on a robot platform and applying a feature-based method (SIFT) and two main systems in order to search and classify the given images. To obtain acceptable results and improved performance improvement, the algorithm acquires two main maturity levels: one capable of running in ...
This paper describes the preliminary results of a system for extracting sentiments opinioned with... more This paper describes the preliminary results of a system for extracting sentiments opinioned with regard with named entities. It also combines rule-based classification, statistics and machine learning in a new method. The accuracy and speed of extraction and classification are crucial. The service oriented architecture permits the end-user to work with a flexible interface in order to produce applications that range from aggregating consumer feedback on commercial products to measuring public opinion on political issues from ...
Digital libraries have a key role in cultural heritage as they provide access to our culture and ... more Digital libraries have a key role in cultural heritage as they provide access to our culture and history by indexing books and historical documents (newspapers and letters). Digital libraries use natural language processing (NLP) tools to process these documents and enrich them with meta-information, such as named entities. Despite recent advances in these NLP models, most of them are built for specific languages and contemporary documents that are not optimized for handling historical material that may for instance contain language variations and optical character recognition (OCR) errors. In this work, we focused on the entity linking (EL) task that is fundamental to the indexation of documents in digital libraries. We developed a Multilingual Entity Linking architecture for HIstorical preSS Articles that is composed of multilingual analysis, OCR correction, and filter analysis to alleviate the impact of historical documents in the EL task. The source code is publicly available. E...
In this paper we describe a system that participated in the fourth benchmarking activity ImageCLE... more In this paper we describe a system that participated in the fourth benchmarking activity ImageCLEF, in the Robot Vision task, for which we approach the task of topological localization without using a temporal continuity of the sequences of images. We provide details for the state-of-the-art methods that were selected: Color Histograms, SIFT (Scale Invariant Feature Transform), ASIFT (Ane SIFT) and RGB- SIFT, Bag-of-Visual-Words strategy inspired from the text retrieval com- munity. We focused on nding the optimal set of features and a deepened analysis was carried out. We oer an analysis of the dierent features, similarity measures and a performance evaluation of combinations of the proposed methods for topological localization. Also, we detail a genetic algorithm that was used for eliminating the false positives results. In the end, we draw several conclusions targeting the advantages of using proper congurations of visual-based appearance descriptors, similarity measures and clas...
Information Extraction systems must cope with two problems : they heavily depend on the considere... more Information Extraction systems must cope with two problems : they heavily depend on the considered domain but the cost of development for a domain-specific system is important. We propose a new solution for role la- beling in the event-extraction task that relies on using unsupervised word representations (word embeddings) as word features. We automatically learn domain-relevant distributed representations from a domain-specific unlabeled corpus without complex linguistic processing and use these features in a supervised classifier. Our experimental results on the MUC-4 corpus show that this system outperforms state-of-the-art systems on this event extraction task, especially when the amount of annotated data is small.We also show that using word representations induced on a domain-relevant dataset achieves better results than using more general word embeddings.
In this paper, we propose a recent and underresearched paradigm for the task of event detection (... more In this paper, we propose a recent and underresearched paradigm for the task of event detection (ED) by casting it as a questionanswering (QA) problem with the possibility of multiple answers and the support of entities. The extraction of event triggers is, thus, transformed into the task of identifying answer spans from a context, while also focusing on the surrounding entities. The architecture is based on a pre-trained and fine-tuned language model, where the input context is augmented with entities marked at different levels, their positions, their types, and, finally, the argument roles. Experiments on the ACE 2005 corpus demonstrate that the proposed paradigm is a viable solution for the ED task and it significantly outperforms the state-of-the-art models. Moreover, we prove that our methods are also able to extract unseen event types.
Cet article aborde la tâche de détection d’événements, visant à identifier et catégoriser les men... more Cet article aborde la tâche de détection d’événements, visant à identifier et catégoriser les mentions d’événements dans les textes. Une des difficultés de cette tâche est le problème des mentions d’événements correspondant à des mots mal orthographiés, très spécifiques ou hors vocabulaire. Pour analyser l’impact de leur prise en compte par le biais de modèles de caractères, nous proposons d’intégrer des plongements de caractères, qui peuvent capturer des informations morphologiques et de forme sur les mots, à un modèle convolutif pour la détection d’événements. Plus précisément, nous évaluons deux stratégies pour réaliser une telle intégration et montrons qu’une approche de fusion tardive surpasse à la fois une approche de fusion précoce et des modèles intégrant des informations sur les caractères ou les sous-mots tels que ELMo ou BERT.
This paper summarizes the participation of the Laboratoire Informatique, Image et Interaction (L3... more This paper summarizes the participation of the Laboratoire Informatique, Image et Interaction (L3i laboratory) of the University of La Rochelle in the Recognizing Ultra Finegrained Entities (RUFES) track1 within the Text Analysis Conference (TAC) series of evaluation workshops. Our participation relies on two neural-based models, one based on a pretrained and fine-tuned language model with a stack of Transformer layers for fine-grained entity extraction and one out-of-the-box model for within-document entity coreference. We observe that our approach has great potential in increasing the performance of fine-grained entity recognition. Thus, the future work envisioned is to enhance the ability of the models following additional experiments and a deeper analysis of the results.
This paper presents the TLR participation in the FinNum-2 task. Our system is based on a Transfor... more This paper presents the TLR participation in the FinNum-2 task. Our system is based on a Transformer architecture improved by a pre-processing strategy for numeral attachment identification. Instead of relying on a vanilla attention mechanism, we focus the attention to specific tokens that are essential for the task. The results in an unseen test collection show that our model correctly generalises the predictions as our best run outperforms all those of other participants in terms of F1-macro (official metric). Further, results show the robustness of our method as well as the experiments with two alternatives (with and without parameter tuning) leading to an additional improvement of 4% over our best run. ACM Reference Format: Jose G. Moreno, Emanuela Boros, and Antoine Doucet. 2018. TLR at the NTCIR-15 FinNum-2 Task: Improving Text Classifiers for Numeral Attachment in Financial Social Data. In Woodstock ’18: ACM Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NY ...
In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Lea... more In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Learning Semantic Similarities for the Financial domain. The main focus of this task is to evaluate the classification of financial terms into corresponding top-level concepts (also known as hypernyms) that were extracted from an external ontology. We approached the task as a semantic textual similarity problem. By relying on a siamese network with pre-trained language model encoders, we derived semantically meaningful term embeddings and computed similarity scores between them in a ranked manner. Additionally, we exhibit the results of different baselines in which the task is tackled as a multi-class classification problem. The proposed methods outperformed our baselines and proved the robustness of the models based on textual similarity siamese network.
This paper describes UAIC’s Question Answering for Machine Reading Evaluation systems participati... more This paper describes UAIC’s Question Answering for Machine Reading Evaluation systems participating in the QA4MRE 2012 evaluation task. We submitted two types of runs, first type of runs based on our system from 2011 edition of QA4MRE, and second type of runs based on Textual Entailment system. For second types of runs, we construct the Text and the Hypothesis, asked by Textual Entailment system from initial test data (the tag was used to build the Text and the and tags were used to build the Hypothesis). The results offered by organizer showed that second type of runs were better than first type of runs for English.
Du point de vue du traitement automatique des langues (TAL), l’extraction des evenements dans les... more Du point de vue du traitement automatique des langues (TAL), l’extraction des evenements dans les textes est la forme la plus complexe des processus d’extraction d’information, qui recouvrent de facon plus generale l’extraction des entites nommees et des relations qui les lient dans les textes. Le cas des evenements est particulierement ardu car un evenement peut etre assimile a une relation n-aire ou a une configuration de relations. Alors que la recherche en extraction d’information a largement beneficie des jeux de donnees etiquetes manuellement pour apprendre des modeles permettant l’analyse des textes, la disponibilite de ces ressources reste un probleme important. En outre, de nombreuses approches en extraction d’information fondees sur l’apprentissage automatique reposent sur la possibilite d’extraire a partir des textes de larges en sembles de traits definis manuellement grâce a des outils de TAL elabores. De ce fait, l’adaptation a un nouveau domaine constitue un defi suppl...
Uploads
Papers by Emanuela Boros