Skip to main content

Ricardo Ribeiro

Followers

7

Following

1

Public Views

Interests

Uploads

Papers by Ricardo Ribeiro

Creating and Maintaining Multi-purpose Lexical Knowledge

Abstract. This report addresses the problem of maintaining linguistic data collections adequate t... more Abstract. This report addresses the problem of maintaining linguistic data collections adequate to the needs of different applications. We posit that when developing NLP applications, one has to manage not only the software development process, but also the linguistic data: handling them separately will reduce the complexity of the process as a whole, thereby increasing the overall quality. Data consistency is also improved since there is only one collection to manage.

Young researchers face-to-face on human-machine dialogue

The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen ... more The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen Mary University in London last month. The event-organized by young (and not so young!) researchers, most of whom were participants in the 2008 edition in Columbus, Ohio-brought together 41 researchers in academia and industry, from a wide variety of institutions and from several continents. Many had also participated in the SIGDial or Interspeech conferences with which the event is affiliated.

INSTITUTO SUPERIOR TÉCNICO

Resumo Modelos de Dimensões Suplementares Universais Minimais (MUED) prevêem a existência de part... more

Key Phrase Extraction of Lightly Filtered Broadcast News

This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applie... more This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis.

Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity

Abstract In automatic summarization, centrality-as-relevance means that the most important conten... more Abstract In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content.

Ontology construction: cooking domain

Abstract This paper describes an ontology for the cooking domain, reporting on the ontology build... more Abstract This paper describes an ontology for the cooking domain, reporting on the ontology building process, its life cycle, applied methodologies, taken decisions and achieved results. In the past, our research group built a generic dialogue system able to manage specific devices at home, such as TVs, lamps and windows. The cooking domain appeared as an interesting research area, where our technologies could be applied, and techniques could be explored in order to make the system more independent from new domains.

Using prior knowledge to assess relevance in speech summarization

Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summa... more Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summarization, assessing its influence throughout several term weighting schemes. All information is combined using latent semantic analysis as a core procedure to compute the relevance of the sentence-like units of the given input source. Evaluation is performed using the self-information measure, which tries to capture the informativeness of the summary in relation to the summarized input source.

Recursos Linguisticos e Processamento Morfológico do Português: o PALAVROSO eo projecto LE PAROLE

Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso par... more Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso para a língua portuguesa, eo processo de reutilização dos recursos linguísticos nele contidos para a construção de um léxico para o Português, no âmbito do Projecto LE-PAROLE, ou seja, integrado na política da União Europeia para o desenvolvimento da Engenharia da Linguagem.

How to integrate data from different sources

Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic dat... more Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic data descriptions. The repository is able to integrate and merge multiple/concurrent descriptions of linguistic entities and allows existing relationships to be extended and new ones created. In addition, the repository is capable of also storing metadata, allowing for richer descriptions. We present results from work on large data collections and preview developments resulting from ongoing work.

Using Morphossyntactic Information in TTS Systems: Comparing Strategies for European Portuguese

To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important t... more To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important to obtain the maximum amount of information from the input text that may help in this task. This covers a wide range of possibilities that can go from the simple conversion of non orthographic items to more complex syntactic and semantic analysis. In this paper, we present the development of a morphossyntactic tagging system and analyze its influence on the performance of a TTS system for European Portuguese.

Reusing linguistic resources: a case study in morphossyntactic tagging

Abstract This paper describes several issues concerning the reusability of linguistic resources, ... more Abstract This paper describes several issues concerning the reusability of linguistic resources, with special emphasis on morphossyntactic tagging. Ribeiro (2003) presents a morphossyntactic tagging system with a modular architecture. What are the consequences of changing a module of this system? How difficult would be to integrate the morphossyntactic tagger in other systems?

Rethinking reusable resources

Abstract We address the common and recurring problem of data reuse, focusing on the following top... more Abstract We address the common and recurring problem of data reuse, focusing on the following topics:(i) the current state of affairs (in particular, problems with data);(ii) requirements for change;(iii) the proposed solution (its problems and advantages, as well as related work in this area), including the canonical-, I/O-, and data transformation models;(iv) maintenance issues;(v) implementation and deployment aspects;(vi) conclusions and future directions, including results from work done so far and aspects that merit future work.

Morphossyntactic Disambiguation for TTS Systems

Abstract The purpose of this paper is to present the development of a morphossyntactic disambigua... more Abstract The purpose of this paper is to present the development of a morphossyntactic disambiguation system (or part-of-speech tagging system) which is intended to be used as a component of a Text-to-Speech (TTS) system for European Portuguese. In the development of the tagger, we compared two approaches: a probabilistic-based approach and a hybrid approach. Besides comparing these two approaches, this paper considers the effects of the different classes of errors on the performance of the complete TTS system.

INFLUENCE OF PEAK SELECTION METHODS ON ONSET DETECTION

ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset... more ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset detection, is an important task as this information can be used as the basis for high-level musical processing tasks. Many different methods exist to perform onset detection. However their results depend on a Peak Selection step that makes the decision whether an onset is present at some point in time.

Question Interpretation in QA@ L 2 F

Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2... more Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2 F/INESC-ID, is thoroughly described in this paper, as well as the frame formalism 1 it employs. Moreover, the anaphora resolution process introduced this year, based on frames manipulation, is detailed.

Qa@ l2f, second steps at qa@ clef

Abstract This paper describes the participation of QA@ L2F, the question-answering system from L2... more

QA@ L 2 F, First Steps at QA@ CLEF

This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L ... more This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L 2 F follows different strategies according with the question type, and relies strongly on named entity recognition and on the pre-detection of linguistic patterns. Each question type is mapped into a single strategy; however, if no answer is found, the system proceeds and tries to find an answer using one of the other strategies.

The L2F Strategy for Sentiment Analysis and Topic Classification

Abstract: This paper describes the strategy used by the L2F team for performing automatic sentime... more

Some language resources and tools for computational processing of portuguese at inesc

Abstract In the last few years automatic processing tools and studies based on corpora have becam... more Abstract In the last few years automatic processing tools and studies based on corpora have became of a great importance for the community. The possibility of evaluating and developing such tools and studies depends on the availability of language resources. For the Portuguese language in its several national varieties these resources are not enough to meet the community needs.

Qa@ l2f@ qa@ clef

Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its result... more Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its results in the QA@ CLEF07 evaluation task. QA@ L2F bases its performance on a high-quality deep linguistic analysis of the question, which is strongly based on named entity recognition. However, if a precise analysis is not possible or if no answer is found in previous processed data, the system is also capable of relaxing and tries to find an answer using a flexible pattern matching approach.

Creating and Maintaining Multi-purpose Lexical Knowledge

Abstract. This report addresses the problem of maintaining linguistic data collections adequate t... more Abstract. This report addresses the problem of maintaining linguistic data collections adequate to the needs of different applications. We posit that when developing NLP applications, one has to manage not only the software development process, but also the linguistic data: handling them separately will reduce the complexity of the process as a whole, thereby increasing the overall quality. Data consistency is also improved since there is only one collection to manage.

Young researchers face-to-face on human-machine dialogue

The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen ... more The fifth Young Researchers' Roundtable on Spoken Dialogue Systems (YRRSDS'09) was held at Queen Mary University in London last month. The event-organized by young (and not so young!) researchers, most of whom were participants in the 2008 edition in Columbus, Ohio-brought together 41 researchers in academia and industry, from a wide variety of institutions and from several continents. Many had also participated in the SIGDial or Interspeech conferences with which the event is affiliated.

INSTITUTO SUPERIOR TÉCNICO

Resumo Modelos de Dimensões Suplementares Universais Minimais (MUED) prevêem a existência de part... more

Key Phrase Extraction of Lightly Filtered Broadcast News

This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applie... more This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis.

Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity

Abstract In automatic summarization, centrality-as-relevance means that the most important conten... more Abstract In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content.

Ontology construction: cooking domain

Abstract This paper describes an ontology for the cooking domain, reporting on the ontology build... more Abstract This paper describes an ontology for the cooking domain, reporting on the ontology building process, its life cycle, applied methodologies, taken decisions and achieved results. In the past, our research group built a generic dialogue system able to manage specific devices at home, such as TVs, lamps and windows. The cooking domain appeared as an interesting research area, where our technologies could be applied, and techniques could be explored in order to make the system more independent from new domains.

Using prior knowledge to assess relevance in speech summarization

Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summa... more Abstract We explore the use of topic-based automatically acquired prior knowledge in speech summarization, assessing its influence throughout several term weighting schemes. All information is combined using latent semantic analysis as a core procedure to compute the relevance of the sentence-like units of the given input source. Evaluation is performed using the self-information measure, which tries to capture the informativeness of the summary in relation to the summarized input source.

Recursos Linguisticos e Processamento Morfológico do Português: o PALAVROSO eo projecto LE PAROLE

Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso par... more Neste artigo descrevemos o dicionário eo funcionamento do analisador morfológico do Palavroso para a língua portuguesa, eo processo de reutilização dos recursos linguísticos nele contidos para a construção de um léxico para o Português, no âmbito do Projecto LE-PAROLE, ou seja, integrado na política da União Europeia para o desenvolvimento da Engenharia da Linguagem.

How to integrate data from different sources

Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic dat... more Abstract We present a dynamic multilingual repository for multi-source, multilevel linguistic data descriptions. The repository is able to integrate and merge multiple/concurrent descriptions of linguistic entities and allows existing relationships to be extended and new ones created. In addition, the repository is capable of also storing metadata, allowing for richer descriptions. We present results from work on large data collections and preview developments resulting from ongoing work.

Using Morphossyntactic Information in TTS Systems: Comparing Strategies for European Portuguese

To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important t... more To improve the quality of the speech produced by a Text-to-Speech (TTS) system, it is important to obtain the maximum amount of information from the input text that may help in this task. This covers a wide range of possibilities that can go from the simple conversion of non orthographic items to more complex syntactic and semantic analysis. In this paper, we present the development of a morphossyntactic tagging system and analyze its influence on the performance of a TTS system for European Portuguese.

Reusing linguistic resources: a case study in morphossyntactic tagging

Abstract This paper describes several issues concerning the reusability of linguistic resources, ... more Abstract This paper describes several issues concerning the reusability of linguistic resources, with special emphasis on morphossyntactic tagging. Ribeiro (2003) presents a morphossyntactic tagging system with a modular architecture. What are the consequences of changing a module of this system? How difficult would be to integrate the morphossyntactic tagger in other systems?

Rethinking reusable resources

Abstract We address the common and recurring problem of data reuse, focusing on the following top... more Abstract We address the common and recurring problem of data reuse, focusing on the following topics:(i) the current state of affairs (in particular, problems with data);(ii) requirements for change;(iii) the proposed solution (its problems and advantages, as well as related work in this area), including the canonical-, I/O-, and data transformation models;(iv) maintenance issues;(v) implementation and deployment aspects;(vi) conclusions and future directions, including results from work done so far and aspects that merit future work.

Morphossyntactic Disambiguation for TTS Systems

Abstract The purpose of this paper is to present the development of a morphossyntactic disambigua... more Abstract The purpose of this paper is to present the development of a morphossyntactic disambiguation system (or part-of-speech tagging system) which is intended to be used as a component of a Text-to-Speech (TTS) system for European Portuguese. In the development of the tagger, we compared two approaches: a probabilistic-based approach and a hybrid approach. Besides comparing these two approaches, this paper considers the effects of the different classes of errors on the performance of the complete TTS system.

INFLUENCE OF PEAK SELECTION METHODS ON ONSET DETECTION

ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset... more ABSTRACT Finding the starting time of musical notes in an audio signal, that is, to perform onset detection, is an important task as this information can be used as the basis for high-level musical processing tasks. Many different methods exist to perform onset detection. However their results depend on a Peak Selection step that makes the decision whether an onset is present at some point in time.

Question Interpretation in QA@ L 2 F

Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2... more Abstract. The Question Interpretation module of QA@ L 2 F, the question-answering system from L 2 F/INESC-ID, is thoroughly described in this paper, as well as the frame formalism 1 it employs. Moreover, the anaphora resolution process introduced this year, based on frames manipulation, is detailed.

Qa@ l2f, second steps at qa@ clef

Abstract This paper describes the participation of QA@ L2F, the question-answering system from L2... more

QA@ L 2 F, First Steps at QA@ CLEF

This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L ... more This paper presents QA@ L 2 F, the question-answering system developed at L 2 F, INESC-ID. QA@ L 2 F follows different strategies according with the question type, and relies strongly on named entity recognition and on the pre-detection of linguistic patterns. Each question type is mapped into a single strategy; however, if no answer is found, the system proceeds and tries to find an answer using one of the other strategies.

The L2F Strategy for Sentiment Analysis and Topic Classification

Abstract: This paper describes the strategy used by the L2F team for performing automatic sentime... more

Some language resources and tools for computational processing of portuguese at inesc

Abstract In the last few years automatic processing tools and studies based on corpora have becam... more Abstract In the last few years automatic processing tools and studies based on corpora have became of a great importance for the community. The possibility of evaluating and developing such tools and studies depends on the availability of language resources. For the Portuguese language in its several national varieties these resources are not enough to meet the community needs.

Qa@ l2f@ qa@ clef

Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its result... more Abstract This paper introduces L2F's (INESC-ID) question/answering system and presents its results in the QA@ CLEF07 evaluation task. QA@ L2F bases its performance on a high-quality deep linguistic analysis of the question, which is strongly based on named entity recognition. However, if a precise analysis is not possible or if no answer is found in previous processed data, the system is also capable of relaxing and tries to find an answer using a flexible pattern matching approach.