We propose a language portable Named Entity detection module developed and tested over Spanish an... more We propose a language portable Named Entity detection module developed and tested over Spanish and Portuguese. The influence of different feature sets over the classification task was studied and demonstrated. The differences in language models learned by three datadriven systems performing the same NLP tasks were examined. They were combined in order to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 92.96% f-score for Spanish and 78.86% f-score for Portuguese language portable detection were achieved. For Spanish the classification which is based on the language portable detection reached 78.59% f-score. Compared with the systems competing in CoNLL-2002 our system reaches third place
This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning... more This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning algorithms that can be applied to different Natural Language Processing tasks. The process of selecting the algorithms and their parameters is modelled as an optimization problem and a technique was proposed to find an optimal combination based on the metaheuristic Population-Based Incremental Learning (PBIL). For validation purposes, this approach is applied to a standard opinion mining problem. NLP-Opt effectively optimizes the algorithms and parameters of pipelines. Additionally, NLP-Opt outputs probabilistic information about the optimization process, revealing the most relevant components of pipelines. The proposed technique can be applied to different Natural Language Processing problems, and the information provided by NLP-Opt can be used by researchers to gain insights on the characteristics of the best-performing pipelines. The source code is made available for other researcher...
The increasing amount of subjective data on the Web is creating the need to develop effective Que... more The increasing amount of subjective data on the Web is creating the need to develop effective Question Answering systems able to discriminate such information from factual data, and subsequently process it with specific methods. The participants in the IBEREVAL OM tasks will be given a set of opinion questions (in Spanish and English). Optionally, they will also be able to receive the same set of opinion questions, in which the source, target and expected polarity, as well as the time span the question is referring to are given. They will also be provided with a collection of blog posts, extracted using the Technorati blog search engine (in Spanish and English), in which the answers to the opinion questions should be found The gold standard for this blog posts collection will previously be annotated using the EmotiBlog scheme, by a number of 3 annotators. The EmotiBlog corpus and the set of questions presented in (Balahur et al., 2009) – in their present state will be provided for s...
Dossier Next es una aplicación web que permite rastrear y clasificar de forma automática y periód... more Dossier Next es una aplicación web que permite rastrear y clasificar de forma automática y periódica la información procedente de cualquier tipo de fuente de internet con contenido textual i.e. páginas web, periódicos digitales, boletines oficiales, etc., extrayendo únicamente la información más interesante para el usuario y descartando el resto. Para comenzar la recolección de información, el usuario debe indicar las fuentes a partir de las cuales desea obtener información, haciendo referencia a cada una las partes de interés para la recuperación de información, i.e. título, cuerpo, fecha, autor, etc. Para que el sistema pueda decidir qué contenido es relevante para el usuario, éste debe indicar también una serie de palabras claves, que deben aparecer en el documento en cuestión. Una vez establecida esta configuración del sistema se procede a descargar toda la información deseada por el usuario. Desde la página inicial se pueden visualizar los documentos por fecha y fuente.Dossier ...
We propose a language portable Named Entity detection module developed and tested over Spanish an... more We propose a language portable Named Entity detection module developed and tested over Spanish and Portuguese. The influence of different feature sets over the classification task was studied and demonstrated. The differences in language models learned by three datadriven systems performing the same NLP tasks were examined. They were combined in order to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 92.96% f-score for Spanish and 78.86% f-score for Portuguese language portable detection were achieved. For Spanish the classification which is based on the language portable detection reached 78.59% f-score. Compared with the systems competing in CoNLL-2002 our system reaches third place
This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning... more This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning algorithms that can be applied to different Natural Language Processing tasks. The process of selecting the algorithms and their parameters is modelled as an optimization problem and a technique was proposed to find an optimal combination based on the metaheuristic Population-Based Incremental Learning (PBIL). For validation purposes, this approach is applied to a standard opinion mining problem. NLP-Opt effectively optimizes the algorithms and parameters of pipelines. Additionally, NLP-Opt outputs probabilistic information about the optimization process, revealing the most relevant components of pipelines. The proposed technique can be applied to different Natural Language Processing problems, and the information provided by NLP-Opt can be used by researchers to gain insights on the characteristics of the best-performing pipelines. The source code is made available for other researcher...
The increasing amount of subjective data on the Web is creating the need to develop effective Que... more The increasing amount of subjective data on the Web is creating the need to develop effective Question Answering systems able to discriminate such information from factual data, and subsequently process it with specific methods. The participants in the IBEREVAL OM tasks will be given a set of opinion questions (in Spanish and English). Optionally, they will also be able to receive the same set of opinion questions, in which the source, target and expected polarity, as well as the time span the question is referring to are given. They will also be provided with a collection of blog posts, extracted using the Technorati blog search engine (in Spanish and English), in which the answers to the opinion questions should be found The gold standard for this blog posts collection will previously be annotated using the EmotiBlog scheme, by a number of 3 annotators. The EmotiBlog corpus and the set of questions presented in (Balahur et al., 2009) – in their present state will be provided for s...
Dossier Next es una aplicación web que permite rastrear y clasificar de forma automática y periód... more Dossier Next es una aplicación web que permite rastrear y clasificar de forma automática y periódica la información procedente de cualquier tipo de fuente de internet con contenido textual i.e. páginas web, periódicos digitales, boletines oficiales, etc., extrayendo únicamente la información más interesante para el usuario y descartando el resto. Para comenzar la recolección de información, el usuario debe indicar las fuentes a partir de las cuales desea obtener información, haciendo referencia a cada una las partes de interés para la recuperación de información, i.e. título, cuerpo, fecha, autor, etc. Para que el sistema pueda decidir qué contenido es relevante para el usuario, éste debe indicar también una serie de palabras claves, que deben aparecer en el documento en cuestión. Una vez establecida esta configuración del sistema se procede a descargar toda la información deseada por el usuario. Desde la página inicial se pueden visualizar los documentos por fecha y fuente.Dossier ...
Uploads
Papers by Andres Montoyo