Skip to main content

Davide Buscaldi

Université Sorbonne Paris Nord / Sorbonne Paris Nord University, Laboratoire d'Informatique de Paris-Nord, Faculty Member

Université Sorbonne Paris Nord / Sorbonne Paris Nord University, IUT Villetaneuse, Faculty Member

Followers

153

Following

8

Public Views

Supervisors: Paolo Rosso
Address: 99 Avenue Jean Baptiste Clément
93430 Villetaneuse

less

InterestsView All (12)

Uploads

Papers by Davide Buscaldi

Affect-based features for humour recognition

Abstract The actual trends in NLP are focusing on analysing knowledge beyond the language: moods,... more Abstract The actual trends in NLP are focusing on analysing knowledge beyond the language: moods, sentiments, attitudes, etc. In this paper we focused on studying the importance of affectiveness information for humour recognition. Several experiments were performed over 7,500 blogs using some features reported in the literature, besides a set of new ones. A classification task was executed in order to verify the features relevance. The results indicate an interesting behaviour regarding to affective information.

Voice-QA: Evaluating the Impact of Misrecognized Words on Passage Retrieval

Abstract Question Answering is an Information Retrieval task where the query is posed using natur... more Abstract Question Answering is an Information Retrieval task where the query is posed using natural language and the expected result is a concise answer. Voice-activated Question Answering systems represent an interesting application, where the question is formulated by speech. In these systems, an Automatic Speech Recognition module can be used to transcribe the question. Thus, recognition errors may be introduced, producing a significant effect on the answer retrieval process.

Quasar: The question answering system of the universidad politécnica de valencia

Abstract. This paper describes the QUASAR Question Answering Information System developed by the ... more Abstract. This paper describes the QUASAR Question Answering Information System developed by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2005 edition of the CLEF Question Answering exercise. We participated in three monolingual tasks: Spanish, Italian and French, and in two cross-language tasks: Spanish to English and English to Spanish.

GeoTextMESS: result fusion with fuzzy Borda ranking in geographical information retrieval

Abstract. In this paper we discuss the integration of different GIR systems by means of a fuzzy B... more Abstract. In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politécnica de Valencia and the one of the Universidad of Jaén participated to the GeoCLEF task under the name TextMess. The proposed result fusion method takes as input the document lists returned by the different systems and returns a document list where the documents are ranked according to the fuzzy Borda voting scheme.

IRIT: Textual Similarity Combining Conceptual Similarity with an N-Gram Comparison Method

Abstract This paper describes the participation of the IRIT team to SemEval 2012 Task 6 (Semantic... more

The impact of semantic and morphosyntactic ambiguity on automatic humour recognition

Abstract. Humour is one of the most amazing characteristics that defines us as human beings and s... more Abstract. Humour is one of the most amazing characteristics that defines us as human beings and social entities. Its study supposes a deep insight into several areas such as linguistics, psychology or philosophy. From the Natural Language Processing (NLP) perspective, recent researches have shown that humour can be automatically generated and recognized with some success.

An analysis of the impact of ambiguity on automatic humour recognition

Abstract. One of the most amazing characteristics that defines the human being is humour. Its ana... more Abstract. One of the most amazing characteristics that defines the human being is humour. Its analysis implies a set of subjective and fuzzy factors, such as the linguistic, psychological or sociological variables that produce it. This is one of the reasons why its automatic processing seems to be not straightforward. However, recent researches in the Natural Language Processing area have shown that humour can automatically be generated and recognised with success.

Textmess at geoclef 2008: Result merging with fuzzy borda ranking

Abstract This paper describe the joint participation by the Universidad Politécnica de Valencia a... more Abstract This paper describe the joint participation by the Universidad Politécnica de Valencia and the Universidad of Jaén to the GeoCLEF task. This activity has been carried out within the framework of the Spanish TextMESS project (Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies). The method employed for the participation is a result merging algorithm based on the fuzzy Borda voting scheme.

Evaluation protocol and tools for question-answering on speech transcripts

Abstract Question Answering (QA) technology aims at providing relevant answers to natural languag... more Abstract Question Answering (QA) technology aims at providing relevant answers to natural language questions. Most Question Answering research has focused on mining document collections containing written texts to answer written questions. In addition to written sources, a large (and growing) amount of potentially interesting information appears in spoken documents, such as broadcast news, speeches, seminars, meetings or telephone conversations.

Geooreka: enhancing web searches with geographical information

Abstract. Geographical information is achieving an increasing importance in the World Wide Web. R... more Abstract. Geographical information is achieving an increasing importance in the World Wide Web. Recently, the web saw a growth in the use of map-based services; however, these services are usually used as visual yellow pages rather than search engines. The action of finding a web page relevant to a specific topic and a specific area is still mostly dependent on classical keyword based methods.

NLEL-MAAT at CLEF-IP

This report presents the work carried out at NLE Lab for the CLEF-IP 2009 competition. We adapted... more This report presents the work carried out at NLE Lab for the CLEF-IP 2009 competition. We adapted the JIRS passage retrieval system for this task, with the objective to exploit the stylistic characteristics of the patents. Since JIRS was developed for the Question Answering task and this is the first time its model was used to compare entire documents, we had to carry out some transformations on the patent documents.

Tag semantics for the retrieval of XML documents

Abstract Word Sense Disambiguation (WSD), in the field of Natural Language Processing (NLP), cons... more Abstract Word Sense Disambiguation (WSD), in the field of Natural Language Processing (NLP), consists in assigning the correct sense (semantics) to a word form (lexeme) by means of the context in which the lexeme is found. In this paper we investigate the possibility of applying WSD techniques to the field of Information Retrieval, especially to the retrieval of XML documents. We consider two methods to automatically assign semantic values to XML tags on the grounds of the tagged text contained.

Some experiments in question answering with a disambiguated document collection

Abstract. This paper describes our approach to the Question Answering-Word Sense Disambiguation t... more Abstract. This paper describes our approach to the Question Answering-Word Sense Disambiguation task. This task consists in carrying out Question Answering over a disambiguated document collection. In our approach, disambiguated documents are used to improve the accuracy of the retrieval phase. In order to do this, we added a WordNet-expanded index to the document collection. The expanded index contains synonyms, hypernyms and holonyms of the words already in the documents.

The UPV at GeoCLEF 2008: The GeoWorSE system

Abstract This year our system was complemented with a map-based filter. During the indexing phase... more Abstract This year our system was complemented with a map-based filter. During the indexing phase, all places are disambiguated and assigned their coordinates on the map. These coordinates are stored in a separate index. The search process is carried out in two phases: in the first one, we search the collection with the same method applied in 2007, which exploits the expansion of index terms by means of WordNet synonyms and holonyms.

NLEL at CLEF 2009 Robust WSD Task

Abstract This report describes our approach to the Robust-Word Sense Disambiguation task. We appl... more Abstract This report describes our approach to the Robust-Word Sense Disambiguation task. We applied the same index expansion technique used in 2008 for the Question Answering WSD task, with the addition of pseudo (blind) relevance feedback. In our approach, a WordNet expanded index is generated from the disambiguated document collection. This index contains synonyms, hypernyms and holonyms of the disambiguated words contained in documents.

From humor recognition to irony detection: The figurative language of social media

The research described in this paper is focused on analyzing two playful domains of language: hum... more The research described in this paper is focused on analyzing two playful domains of language: humor and irony, in order to identify key values components for their automatic processing. In particular, we are focused on describing a model for recognizing these phenomena in social media, such as “tweets". Our experiments are centered on five data sets retrieved from Twitter taking advantage of user-generated tags, such as “# humor" and “# irony".

Verb sense disambiguation using support vector machines: impact of wordnet-extracted features

Abstract. The disambiguation of verbs is usually considered to be more difficult with respect to ... more Abstract. The disambiguation of verbs is usually considered to be more difficult with respect to other part-of-speech categories. This is due both to the high polysemy of verbs compared with the other categories, and to the lack of lexical resources providing relations between verbs and nouns. One of such resources is WordNet, which provides plenty of information and relationships for nouns, whereas it is less comprehensive with respect to verbs.

GIR Pharma: a geographic information retrieval approach to locate pharmacies on duty

Abstract This paper describes an approximation based on geographic information retrieval with the... more Abstract This paper describes an approximation based on geographic information retrieval with the purpose to give some solutions to the problem of searching pharmacies on duty in the Spanish territory. It is a novel investigation, which requires collaboration between multidisciplinary teams and that is beginning to show the first progress.

Word Sense Disambiguation with andwithout Supervision Paolo Rosso, Francesco Masulli2, Davide Buscaldi3Dpto. de Sistemas Informéaticos y Computaciéon Univ. Politéecnica de Valencia, Spain

Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural La... more Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural Language Processing. One of the most successful current lines of research in WSD is the corpus-based approach, in which machine learning algorithms are applied to learn statistical models or classifiers from corpora. When a machine learning approach learns from previously semantically annotated corpora it is said to be supervised, whereas when it does not use sense tagged data during training it is called unsupervised.

Context expansion with global keywords for a conceptual density-based WSD

The resolution of the lexical ambiguity, which is commonly referred to as Word Sense Disambiguati... more The resolution of the lexical ambiguity, which is commonly referred to as Word Sense Disambiguation, is still an open problem in the field of Natural Language Processing. An approach to Word Sense Disambiguation based on Conceptual Density, a measure of the correlation between concepts, obtained good results with small context windows. This paper presents a method to integrate global knowledge, expressed as global keywords, in this approach.

Affect-based features for humour recognition

Abstract The actual trends in NLP are focusing on analysing knowledge beyond the language: moods,... more Abstract The actual trends in NLP are focusing on analysing knowledge beyond the language: moods, sentiments, attitudes, etc. In this paper we focused on studying the importance of affectiveness information for humour recognition. Several experiments were performed over 7,500 blogs using some features reported in the literature, besides a set of new ones. A classification task was executed in order to verify the features relevance. The results indicate an interesting behaviour regarding to affective information.

Voice-QA: Evaluating the Impact of Misrecognized Words on Passage Retrieval

Abstract Question Answering is an Information Retrieval task where the query is posed using natur... more Abstract Question Answering is an Information Retrieval task where the query is posed using natural language and the expected result is a concise answer. Voice-activated Question Answering systems represent an interesting application, where the question is formulated by speech. In these systems, an Automatic Speech Recognition module can be used to transcribe the question. Thus, recognition errors may be introduced, producing a significant effect on the answer retrieval process.

Quasar: The question answering system of the universidad politécnica de valencia

Abstract. This paper describes the QUASAR Question Answering Information System developed by the ... more Abstract. This paper describes the QUASAR Question Answering Information System developed by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2005 edition of the CLEF Question Answering exercise. We participated in three monolingual tasks: Spanish, Italian and French, and in two cross-language tasks: Spanish to English and English to Spanish.

GeoTextMESS: result fusion with fuzzy Borda ranking in geographical information retrieval

Abstract. In this paper we discuss the integration of different GIR systems by means of a fuzzy B... more Abstract. In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politécnica de Valencia and the one of the Universidad of Jaén participated to the GeoCLEF task under the name TextMess. The proposed result fusion method takes as input the document lists returned by the different systems and returns a document list where the documents are ranked according to the fuzzy Borda voting scheme.

IRIT: Textual Similarity Combining Conceptual Similarity with an N-Gram Comparison Method

Abstract This paper describes the participation of the IRIT team to SemEval 2012 Task 6 (Semantic... more

The impact of semantic and morphosyntactic ambiguity on automatic humour recognition

Abstract. Humour is one of the most amazing characteristics that defines us as human beings and s... more Abstract. Humour is one of the most amazing characteristics that defines us as human beings and social entities. Its study supposes a deep insight into several areas such as linguistics, psychology or philosophy. From the Natural Language Processing (NLP) perspective, recent researches have shown that humour can be automatically generated and recognized with some success.

An analysis of the impact of ambiguity on automatic humour recognition

Abstract. One of the most amazing characteristics that defines the human being is humour. Its ana... more Abstract. One of the most amazing characteristics that defines the human being is humour. Its analysis implies a set of subjective and fuzzy factors, such as the linguistic, psychological or sociological variables that produce it. This is one of the reasons why its automatic processing seems to be not straightforward. However, recent researches in the Natural Language Processing area have shown that humour can automatically be generated and recognised with success.

Textmess at geoclef 2008: Result merging with fuzzy borda ranking

Abstract This paper describe the joint participation by the Universidad Politécnica de Valencia a... more Abstract This paper describe the joint participation by the Universidad Politécnica de Valencia and the Universidad of Jaén to the GeoCLEF task. This activity has been carried out within the framework of the Spanish TextMESS project (Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies). The method employed for the participation is a result merging algorithm based on the fuzzy Borda voting scheme.

Evaluation protocol and tools for question-answering on speech transcripts

Abstract Question Answering (QA) technology aims at providing relevant answers to natural languag... more Abstract Question Answering (QA) technology aims at providing relevant answers to natural language questions. Most Question Answering research has focused on mining document collections containing written texts to answer written questions. In addition to written sources, a large (and growing) amount of potentially interesting information appears in spoken documents, such as broadcast news, speeches, seminars, meetings or telephone conversations.

Geooreka: enhancing web searches with geographical information

Abstract. Geographical information is achieving an increasing importance in the World Wide Web. R... more Abstract. Geographical information is achieving an increasing importance in the World Wide Web. Recently, the web saw a growth in the use of map-based services; however, these services are usually used as visual yellow pages rather than search engines. The action of finding a web page relevant to a specific topic and a specific area is still mostly dependent on classical keyword based methods.

NLEL-MAAT at CLEF-IP

This report presents the work carried out at NLE Lab for the CLEF-IP 2009 competition. We adapted... more This report presents the work carried out at NLE Lab for the CLEF-IP 2009 competition. We adapted the JIRS passage retrieval system for this task, with the objective to exploit the stylistic characteristics of the patents. Since JIRS was developed for the Question Answering task and this is the first time its model was used to compare entire documents, we had to carry out some transformations on the patent documents.

Tag semantics for the retrieval of XML documents

Abstract Word Sense Disambiguation (WSD), in the field of Natural Language Processing (NLP), cons... more Abstract Word Sense Disambiguation (WSD), in the field of Natural Language Processing (NLP), consists in assigning the correct sense (semantics) to a word form (lexeme) by means of the context in which the lexeme is found. In this paper we investigate the possibility of applying WSD techniques to the field of Information Retrieval, especially to the retrieval of XML documents. We consider two methods to automatically assign semantic values to XML tags on the grounds of the tagged text contained.

Some experiments in question answering with a disambiguated document collection

Abstract. This paper describes our approach to the Question Answering-Word Sense Disambiguation t... more Abstract. This paper describes our approach to the Question Answering-Word Sense Disambiguation task. This task consists in carrying out Question Answering over a disambiguated document collection. In our approach, disambiguated documents are used to improve the accuracy of the retrieval phase. In order to do this, we added a WordNet-expanded index to the document collection. The expanded index contains synonyms, hypernyms and holonyms of the words already in the documents.

The UPV at GeoCLEF 2008: The GeoWorSE system

Abstract This year our system was complemented with a map-based filter. During the indexing phase... more Abstract This year our system was complemented with a map-based filter. During the indexing phase, all places are disambiguated and assigned their coordinates on the map. These coordinates are stored in a separate index. The search process is carried out in two phases: in the first one, we search the collection with the same method applied in 2007, which exploits the expansion of index terms by means of WordNet synonyms and holonyms.

NLEL at CLEF 2009 Robust WSD Task

Abstract This report describes our approach to the Robust-Word Sense Disambiguation task. We appl... more Abstract This report describes our approach to the Robust-Word Sense Disambiguation task. We applied the same index expansion technique used in 2008 for the Question Answering WSD task, with the addition of pseudo (blind) relevance feedback. In our approach, a WordNet expanded index is generated from the disambiguated document collection. This index contains synonyms, hypernyms and holonyms of the disambiguated words contained in documents.

From humor recognition to irony detection: The figurative language of social media

The research described in this paper is focused on analyzing two playful domains of language: hum... more The research described in this paper is focused on analyzing two playful domains of language: humor and irony, in order to identify key values components for their automatic processing. In particular, we are focused on describing a model for recognizing these phenomena in social media, such as “tweets". Our experiments are centered on five data sets retrieved from Twitter taking advantage of user-generated tags, such as “# humor" and “# irony".

Verb sense disambiguation using support vector machines: impact of wordnet-extracted features

Abstract. The disambiguation of verbs is usually considered to be more difficult with respect to ... more Abstract. The disambiguation of verbs is usually considered to be more difficult with respect to other part-of-speech categories. This is due both to the high polysemy of verbs compared with the other categories, and to the lack of lexical resources providing relations between verbs and nouns. One of such resources is WordNet, which provides plenty of information and relationships for nouns, whereas it is less comprehensive with respect to verbs.

GIR Pharma: a geographic information retrieval approach to locate pharmacies on duty

Abstract This paper describes an approximation based on geographic information retrieval with the... more Abstract This paper describes an approximation based on geographic information retrieval with the purpose to give some solutions to the problem of searching pharmacies on duty in the Spanish territory. It is a novel investigation, which requires collaboration between multidisciplinary teams and that is beginning to show the first progress.

Word Sense Disambiguation with andwithout Supervision Paolo Rosso, Francesco Masulli2, Davide Buscaldi3Dpto. de Sistemas Informéaticos y Computaciéon Univ. Politéecnica de Valencia, Spain

Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural La... more Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural Language Processing. One of the most successful current lines of research in WSD is the corpus-based approach, in which machine learning algorithms are applied to learn statistical models or classifiers from corpora. When a machine learning approach learns from previously semantically annotated corpora it is said to be supervised, whereas when it does not use sense tagged data during training it is called unsupervised.

Context expansion with global keywords for a conceptual density-based WSD

The resolution of the lexical ambiguity, which is commonly referred to as Word Sense Disambiguati... more The resolution of the lexical ambiguity, which is commonly referred to as Word Sense Disambiguation, is still an open problem in the field of Natural Language Processing. An approach to Word Sense Disambiguation based on Conceptual Density, a measure of the correlation between concepts, obtained good results with small context windows. This paper presents a method to integrate global knowledge, expressed as global keywords, in this approach.

Ontologies et Recherche d'Information

In this talk (in French) I gave a quick overview of my works since the Ph.D., with the leitmotiv ... more In this talk (in French) I gave a quick overview of my works since the Ph.D., with the leitmotiv of Ontologies and Information Retrieval.
(talk intended to be a presentation in front of my new research group)

Dans ce séminaire j'ai donné une bref introduction autour de mes travaux dépuis le doctorat, en utilisant les ontologies et la recherche d'information en tant que fil rouge.

Ph.D. Dissertation

The Role of Toponym Disambiguation in Information Retrieval and QUestion Answering