Abstract. This report addresses the problem of maintaining linguistic data collections adequate t... more Abstract. This report addresses the problem of maintaining linguistic data collections adequate to the needs of different applications. We posit that when developing NLP applications, one has to manage not only the software development process, but also the linguistic data: handling them separately will reduce the complexity of the process as a whole, thereby increasing the overall quality. Data consistency is also improved since there is only one collection to manage.
The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen ... more The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen Mary University in London last month. The event-organized by young (and not so young!) researchers, most of whom were participants in the 2008 edition in Columbus, Ohio-brought together 41 researchers in academia and industry, from a wide variety of institutions and from several continents. Many had also participated in the SIGDial or Interspeech conferences with which the event is affiliated.
Resumo Modelos de Dimensões Suplementares Universais Minimais (MUED) prevêem a existência de part... more Resumo Modelos de Dimensões Suplementares Universais Minimais (MUED) prevêem a existência de partículas de Kaluza-Klein massivas com estados finais contendo leptões e jactos do Modelo Padrão. Os estado finais de multileptões fornecem a assinatura mais limpa.
This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applie... more This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis.
Abstract In automatic summarization, centrality-as-relevance means that the most important conten... more Abstract In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content.
Abstract This paper describes an ontology for the cooking domain, reporting on the ontology build... more Abstract This paper describes an ontology for the cooking domain, reporting on the ontology building process, its life cycle, applied methodologies, taken decisions and achieved results. In the past, our research group built a generic dialogue system able to manage specific devices at home, such as TVs, lamps and windows. The cooking domain appeared as an interesting research area, where our technologies could be applied, and techniques could be explored in order to make the system more independent from new domains.
Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summa... more Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summarization, assessing its influence throughout several term weighting schemes. All information is combined using latent semantic analysis as a core procedure to compute the relevance of the sentence-like units of the given input source. Evaluation is performed using the self-information measure, which tries to capture the informativeness of the summary in relation to the summarized input source.
Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso par... more Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso para a língua portuguesa, eo processo de reutilização dos recursos linguísticos nele contidos para a construção de um léxico para o Português, no âmbito do Projecto LE-PAROLE, ou seja, integrado na política da União Europeia para o desenvolvimento da Engenharia da Linguagem.
Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic dat... more Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic data descriptions. The repository is able to integrate and merge multiple/concurrent descriptions of linguistic entities and allows existing relationships to be extended and new ones created. In addition, the repository is capable of also storing metadata, allowing for richer descriptions. We present results from work on large data collections and preview developments resulting from ongoing work.
To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important t... more To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important to obtain the maximum amount of information from the input text that may help in this task. This covers a wide range of possibilities that can go from the simple conversion of non orthographic items to more complex syntactic and semantic analysis. In this paper, we present the development of a morphossyntactic tagging system and analyze its influence on the performance of a TTS system for European Portuguese.
Abstract This paper describes several issues concerning the reusability of linguistic resources, ... more Abstract This paper describes several issues concerning the reusability of linguistic resources, with special emphasis on morphossyntactic tagging. Ribeiro (2003) presents a morphossyntactic tagging system with a modular architecture. What are the consequences of changing a module of this system? How difficult would be to integrate the morphossyntactic tagger in other systems?
Abstract We address the common and recurring problem of data reuse, focusing on the following top... more Abstract We address the common and recurring problem of data reuse, focusing on the following topics:(i) the current state of affairs (in particular, problems with data);(ii) requirements for change;(iii) the proposed solution (its problems and advantages, as well as related work in this area), including the canonical-, I/O-, and data transformation models;(iv) maintenance issues;(v) implementation and deployment aspects;(vi) conclusions and future directions, including results from work done so far and aspects that merit future work.
Abstract The purpose of this paper is to present the development of a morphossyntactic disambigua... more Abstract The purpose of this paper is to present the development of a morphossyntactic disambiguation system (or part-of-speech tagging system) which is intended to be used as a component of a Text-to-Speech (TTS) system for European Portuguese. In the development of the tagger, we compared two approaches: a probabilistic-based approach and a hybrid approach. Besides comparing these two approaches, this paper considers the effects of the different classes of errors on the performance of the complete TTS system.
ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset... more ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset detection, is an important task as this information can be used as the basis for high-level musical processing tasks. Many different methods exist to perform onset detection. However their results depend on a Peak Selection step that makes the decision whether an onset is present at some point in time.
Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2... more Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2 F/INESC-ID, is thoroughly described in this paper, as well as the frame formalism 1 it employs. Moreover, the anaphora resolution process introduced this year, based on frames manipulation, is detailed.
Abstract This paper describes the participation of QA@ L2F, the question-answering system from L2... more Abstract This paper describes the participation of QA@ L2F, the question-answering system from L2F/INESC-ID, at the QA track of CLEF in 2008.
This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L ... more This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L 2 F follows different strategies according with the question type, and relies strongly on named entity recognition and on the pre-detection of linguistic patterns. Each question type is mapped into a single strategy; however, if no answer is found, the system proceeds and tries to find an answer using one of the other strategies.
Abstract: This paper describes the strategy used by the L2F team for performing automatic sentime... more Abstract: This paper describes the strategy used by the L2F team for performing automatic sentiment analysis and topic classification over Spanish Twitter data. The L2F system achieved the best results for the topic classification contest, and the second place in terms of sentiment analysis.
Abstract In the last few years automatic processing tools and studies based on corpora have becam... more Abstract In the last few years automatic processing tools and studies based on corpora have became of a great importance for the community. The possibility of evaluating and developing such tools and studies depends on the availability of language resources. For the Portuguese language in its several national varieties these resources are not enough to meet the community needs.
Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its result... more Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its results in the QA@ CLEF07 evaluation task. QA@ L2F bases its performance on a high-quality deep linguistic analysis of the question, which is strongly based on named entity recognition. However, if a precise analysis is not possible or if no answer is found in previous processed data, the system is also capable of relaxing and tries to find an answer using a flexible pattern matching approach.
Abstract. This report addresses the problem of maintaining linguistic data collections adequate t... more Abstract. This report addresses the problem of maintaining linguistic data collections adequate to the needs of different applications. We posit that when developing NLP applications, one has to manage not only the software development process, but also the linguistic data: handling them separately will reduce the complexity of the process as a whole, thereby increasing the overall quality. Data consistency is also improved since there is only one collection to manage.
The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen ... more The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen Mary University in London last month. The event-organized by young (and not so young!) researchers, most of whom were participants in the 2008 edition in Columbus, Ohio-brought together 41 researchers in academia and industry, from a wide variety of institutions and from several continents. Many had also participated in the SIGDial or Interspeech conferences with which the event is affiliated.
Resumo Modelos de Dimensões Suplementares Universais Minimais (MUED) prevêem a existência de part... more Resumo Modelos de Dimensões Suplementares Universais Minimais (MUED) prevêem a existência de partículas de Kaluza-Klein massivas com estados finais contendo leptões e jactos do Modelo Padrão. Os estado finais de multileptões fornecem a assinatura mais limpa.
This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applie... more This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis.
Abstract In automatic summarization, centrality-as-relevance means that the most important conten... more Abstract In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content.
Abstract This paper describes an ontology for the cooking domain, reporting on the ontology build... more Abstract This paper describes an ontology for the cooking domain, reporting on the ontology building process, its life cycle, applied methodologies, taken decisions and achieved results. In the past, our research group built a generic dialogue system able to manage specific devices at home, such as TVs, lamps and windows. The cooking domain appeared as an interesting research area, where our technologies could be applied, and techniques could be explored in order to make the system more independent from new domains.
Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summa... more Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summarization, assessing its influence throughout several term weighting schemes. All information is combined using latent semantic analysis as a core procedure to compute the relevance of the sentence-like units of the given input source. Evaluation is performed using the self-information measure, which tries to capture the informativeness of the summary in relation to the summarized input source.
Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso par... more Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso para a língua portuguesa, eo processo de reutilização dos recursos linguísticos nele contidos para a construção de um léxico para o Português, no âmbito do Projecto LE-PAROLE, ou seja, integrado na política da União Europeia para o desenvolvimento da Engenharia da Linguagem.
Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic dat... more Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic data descriptions. The repository is able to integrate and merge multiple/concurrent descriptions of linguistic entities and allows existing relationships to be extended and new ones created. In addition, the repository is capable of also storing metadata, allowing for richer descriptions. We present results from work on large data collections and preview developments resulting from ongoing work.
To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important t... more To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important to obtain the maximum amount of information from the input text that may help in this task. This covers a wide range of possibilities that can go from the simple conversion of non orthographic items to more complex syntactic and semantic analysis. In this paper, we present the development of a morphossyntactic tagging system and analyze its influence on the performance of a TTS system for European Portuguese.
Abstract This paper describes several issues concerning the reusability of linguistic resources, ... more Abstract This paper describes several issues concerning the reusability of linguistic resources, with special emphasis on morphossyntactic tagging. Ribeiro (2003) presents a morphossyntactic tagging system with a modular architecture. What are the consequences of changing a module of this system? How difficult would be to integrate the morphossyntactic tagger in other systems?
Abstract We address the common and recurring problem of data reuse, focusing on the following top... more Abstract We address the common and recurring problem of data reuse, focusing on the following topics:(i) the current state of affairs (in particular, problems with data);(ii) requirements for change;(iii) the proposed solution (its problems and advantages, as well as related work in this area), including the canonical-, I/O-, and data transformation models;(iv) maintenance issues;(v) implementation and deployment aspects;(vi) conclusions and future directions, including results from work done so far and aspects that merit future work.
Abstract The purpose of this paper is to present the development of a morphossyntactic disambigua... more Abstract The purpose of this paper is to present the development of a morphossyntactic disambiguation system (or part-of-speech tagging system) which is intended to be used as a component of a Text-to-Speech (TTS) system for European Portuguese. In the development of the tagger, we compared two approaches: a probabilistic-based approach and a hybrid approach. Besides comparing these two approaches, this paper considers the effects of the different classes of errors on the performance of the complete TTS system.
ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset... more ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset detection, is an important task as this information can be used as the basis for high-level musical processing tasks. Many different methods exist to perform onset detection. However their results depend on a Peak Selection step that makes the decision whether an onset is present at some point in time.
Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2... more Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2 F/INESC-ID, is thoroughly described in this paper, as well as the frame formalism 1 it employs. Moreover, the anaphora resolution process introduced this year, based on frames manipulation, is detailed.
Abstract This paper describes the participation of QA@ L2F, the question-answering system from L2... more Abstract This paper describes the participation of QA@ L2F, the question-answering system from L2F/INESC-ID, at the QA track of CLEF in 2008.
This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L ... more This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L 2 F follows different strategies according with the question type, and relies strongly on named entity recognition and on the pre-detection of linguistic patterns. Each question type is mapped into a single strategy; however, if no answer is found, the system proceeds and tries to find an answer using one of the other strategies.
Abstract: This paper describes the strategy used by the L2F team for performing automatic sentime... more Abstract: This paper describes the strategy used by the L2F team for performing automatic sentiment analysis and topic classification over Spanish Twitter data. The L2F system achieved the best results for the topic classification contest, and the second place in terms of sentiment analysis.
Abstract In the last few years automatic processing tools and studies based on corpora have becam... more Abstract In the last few years automatic processing tools and studies based on corpora have became of a great importance for the community. The possibility of evaluating and developing such tools and studies depends on the availability of language resources. For the Portuguese language in its several national varieties these resources are not enough to meet the community needs.
Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its result... more Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its results in the QA@ CLEF07 evaluation task. QA@ L2F bases its performance on a high-quality deep linguistic analysis of the question, which is strongly based on named entity recognition. However, if a precise analysis is not possible or if no answer is found in previous processed data, the system is also capable of relaxing and tries to find an answer using a flexible pattern matching approach.
Uploads
Papers by Ricardo Ribeiro