Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Davide Buscaldi
  • 99 Avenue Jean Baptiste Clément
    93430 Villetaneuse
Abstract The actual trends in NLP are focusing on analysing knowledge beyond the language: moods, sentiments, attitudes, etc. In this paper we focused on studying the importance of affectiveness information for humour recognition. Several... more
Abstract The actual trends in NLP are focusing on analysing knowledge beyond the language: moods, sentiments, attitudes, etc. In this paper we focused on studying the importance of affectiveness information for humour recognition. Several experiments were performed over 7,500 blogs using some features reported in the literature, besides a set of new ones. A classification task was executed in order to verify the features relevance. The results indicate an interesting behaviour regarding to affective information.
Abstract Question Answering is an Information Retrieval task where the query is posed using natural language and the expected result is a concise answer. Voice-activated Question Answering systems represent an interesting application,... more
Abstract Question Answering is an Information Retrieval task where the query is posed using natural language and the expected result is a concise answer. Voice-activated Question Answering systems represent an interesting application, where the question is formulated by speech. In these systems, an Automatic Speech Recognition module can be used to transcribe the question. Thus, recognition errors may be introduced, producing a significant effect on the answer retrieval process.
Abstract. This paper describes the QUASAR Question Answering Information System developed by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2005 edition of the... more
Abstract. This paper describes the QUASAR Question Answering Information System developed by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2005 edition of the CLEF Question Answering exercise. We participated in three monolingual tasks: Spanish, Italian and French, and in two cross-language tasks: Spanish to English and English to Spanish.
Abstract. In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politécnica de Valencia and the one of the Universidad of Jaén... more
Abstract. In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politécnica de Valencia and the one of the Universidad of Jaén participated to the GeoCLEF task under the name TextMess. The proposed result fusion method takes as input the document lists returned by the different systems and returns a document list where the documents are ranked according to the fuzzy Borda voting scheme.
Abstract This paper describes the participation of the IRIT team to SemEval 2012 Task 6 (Semantic Textual Similarity). The method used consists of a n-gram based comparison method combined with a conceptual similarity measure that uses... more
Abstract This paper describes the participation of the IRIT team to SemEval 2012 Task 6 (Semantic Textual Similarity). The method used consists of a n-gram based comparison method combined with a conceptual similarity measure that uses WordNet to calculate the similarity between a pair of concepts.
Abstract. Humour is one of the most amazing characteristics that defines us as human beings and social entities. Its study supposes a deep insight into several areas such as linguistics, psychology or philosophy. From the Natural Language... more
Abstract. Humour is one of the most amazing characteristics that defines us as human beings and social entities. Its study supposes a deep insight into several areas such as linguistics, psychology or philosophy. From the Natural Language Processing (NLP) perspective, recent researches have shown that humour can be automatically generated and recognized with some success.
Abstract. One of the most amazing characteristics that defines the human being is humour. Its analysis implies a set of subjective and fuzzy factors, such as the linguistic, psychological or sociological variables that produce it. This is... more
Abstract. One of the most amazing characteristics that defines the human being is humour. Its analysis implies a set of subjective and fuzzy factors, such as the linguistic, psychological or sociological variables that produce it. This is one of the reasons why its automatic processing seems to be not straightforward. However, recent researches in the Natural Language Processing area have shown that humour can automatically be generated and recognised with success.
Abstract This paper describe the joint participation by the Universidad Politécnica de Valencia and the Universidad of Jaén to the GeoCLEF task. This activity has been carried out within the framework of the Spanish TextMESS project... more
Abstract This paper describe the joint participation by the Universidad Politécnica de Valencia and the Universidad of Jaén to the GeoCLEF task. This activity has been carried out within the framework of the Spanish TextMESS project (Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies). The method employed for the participation is a result merging algorithm based on the fuzzy Borda voting scheme.
Abstract Question Answering (QA) technology aims at providing relevant answers to natural language questions. Most Question Answering research has focused on mining document collections containing written texts to answer written... more
Abstract Question Answering (QA) technology aims at providing relevant answers to natural language questions. Most Question Answering research has focused on mining document collections containing written texts to answer written questions. In addition to written sources, a large (and growing) amount of potentially interesting information appears in spoken documents, such as broadcast news, speeches, seminars, meetings or telephone conversations.
Abstract. Geographical information is achieving an increasing importance in the World Wide Web. Recently, the web saw a growth in the use of map-based services; however, these services are usually used as visual yellow pages rather than... more
Abstract. Geographical information is achieving an increasing importance in the World Wide Web. Recently, the web saw a growth in the use of map-based services; however, these services are usually used as visual yellow pages rather than search engines. The action of finding a web page relevant to a specific topic and a specific area is still mostly dependent on classical keyword based methods.
This report presents the work carried out at NLE Lab for the CLEF-IP 2009 competition. We adapted the JIRS passage retrieval system for this task, with the objective to exploit the stylistic characteristics of the patents. Since JIRS was... more
This report presents the work carried out at NLE Lab for the CLEF-IP 2009 competition. We adapted the JIRS passage retrieval system for this task, with the objective to exploit the stylistic characteristics of the patents. Since JIRS was developed for the Question Answering task and this is the first time its model was used to compare entire documents, we had to carry out some transformations on the patent documents.
Abstract Word Sense Disambiguation (WSD), in the field of Natural Language Processing (NLP), consists in assigning the correct sense (semantics) to a word form (lexeme) by means of the context in which the lexeme is found. In this paper... more
Abstract Word Sense Disambiguation (WSD), in the field of Natural Language Processing (NLP), consists in assigning the correct sense (semantics) to a word form (lexeme) by means of the context in which the lexeme is found. In this paper we investigate the possibility of applying WSD techniques to the field of Information Retrieval, especially to the retrieval of XML documents. We consider two methods to automatically assign semantic values to XML tags on the grounds of the tagged text contained.
Abstract. This paper describes our approach to the Question Answering-Word Sense Disambiguation task. This task consists in carrying out Question Answering over a disambiguated document collection. In our approach, disambiguated documents... more
Abstract. This paper describes our approach to the Question Answering-Word Sense Disambiguation task. This task consists in carrying out Question Answering over a disambiguated document collection. In our approach, disambiguated documents are used to improve the accuracy of the retrieval phase. In order to do this, we added a WordNet-expanded index to the document collection. The expanded index contains synonyms, hypernyms and holonyms of the words already in the documents.
Abstract This year our system was complemented with a map-based filter. During the indexing phase, all places are disambiguated and assigned their coordinates on the map. These coordinates are stored in a separate index. The search... more
Abstract This year our system was complemented with a map-based filter. During the indexing phase, all places are disambiguated and assigned their coordinates on the map. These coordinates are stored in a separate index. The search process is carried out in two phases: in the first one, we search the collection with the same method applied in 2007, which exploits the expansion of index terms by means of WordNet synonyms and holonyms.
Abstract This report describes our approach to the Robust-Word Sense Disambiguation task. We applied the same index expansion technique used in 2008 for the Question Answering WSD task, with the addition of pseudo (blind) relevance... more
Abstract This report describes our approach to the Robust-Word Sense Disambiguation task. We applied the same index expansion technique used in 2008 for the Question Answering WSD task, with the addition of pseudo (blind) relevance feedback. In our approach, a WordNet expanded index is generated from the disambiguated document collection. This index contains synonyms, hypernyms and holonyms of the disambiguated words contained in documents.
The research described in this paper is focused on analyzing two playful domains of language: humor and irony, in order to identify key values components for their automatic processing. In particular, we are focused on describing a model... more
The research described in this paper is focused on analyzing two playful domains of language: humor and irony, in order to identify key values components for their automatic processing. In particular, we are focused on describing a model for recognizing these phenomena in social media, such as “tweets". Our experiments are centered on five data sets retrieved from Twitter taking advantage of user-generated tags, such as “# humor" and “# irony".
Abstract. The disambiguation of verbs is usually considered to be more difficult with respect to other part-of-speech categories. This is due both to the high polysemy of verbs compared with the other categories, and to the lack of... more
Abstract. The disambiguation of verbs is usually considered to be more difficult with respect to other part-of-speech categories. This is due both to the high polysemy of verbs compared with the other categories, and to the lack of lexical resources providing relations between verbs and nouns. One of such resources is WordNet, which provides plenty of information and relationships for nouns, whereas it is less comprehensive with respect to verbs.
Abstract This paper describes an approximation based on geographic information retrieval with the purpose to give some solutions to the problem of searching pharmacies on duty in the Spanish territory. It is a novel investigation, which... more
Abstract This paper describes an approximation based on geographic information retrieval with the purpose to give some solutions to the problem of searching pharmacies on duty in the Spanish territory. It is a novel investigation, which requires collaboration between multidisciplinary teams and that is beginning to show the first progress.
Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural Language Processing. One of the most successful current lines of research in WSD is the corpus-based approach, in which machine learning... more
Abstract Word Sense Disambiguation (WSD) is one of the most important open problems in Natural Language Processing. One of the most successful current lines of research in WSD is the corpus-based approach, in which machine learning algorithms are applied to learn statistical models or classifiers from corpora. When a machine learning approach learns from previously semantically annotated corpora it is said to be supervised, whereas when it does not use sense tagged data during training it is called unsupervised.
The resolution of the lexical ambiguity, which is commonly referred to as Word Sense Disambiguation, is still an open problem in the field of Natural Language Processing. An approach to Word Sense Disambiguation based on Conceptual... more
The resolution of the lexical ambiguity, which is commonly referred to as Word Sense Disambiguation, is still an open problem in the field of Natural Language Processing. An approach to Word Sense Disambiguation based on Conceptual Density, a measure of the correlation between concepts, obtained good results with small context windows. This paper presents a method to integrate global knowledge, expressed as global keywords, in this approach.
Abstract The objectives of this research work is to study the effects of toponym (place name) ambiguity in the Geographical Information Retrieval (GIR) task. Our experience with GIR systems shows that toponym ambiguity may be an important... more
Abstract The objectives of this research work is to study the effects of toponym (place name) ambiguity in the Geographical Information Retrieval (GIR) task. Our experience with GIR systems shows that toponym ambiguity may be an important factor in the inability of these systems to take advantage from geographical knowledge. Previous studies over ambiguity and Information Retrieval (IR) suggested that disambiguation may be useful in some specific IR scenario. We suppose that GIR may constitute such a scenario.
This paper describes a method developed for the Robust-Word Sense Disambiguation task at CLEF 2009. In our approach, a WordNet expanded index is generated from the disambiguated document collection. This index contains synonyms, hypernyms... more
This paper describes a method developed for the Robust-Word Sense Disambiguation task at CLEF 2009. In our approach, a WordNet expanded index is generated from the disambiguated document collection. This index contains synonyms, hypernyms and holonyms of the disambiguated words contained in documents. Query words are integrated by terms extracted by means of a pseudo relevance feedback technique.
Abstract. This report describes the participation of the NLEL Lab. from the Universidad Politécnica of Valencia to the RespubliQA task at CLEF 2010. The system designed for this participation is based on the one used in our previous... more
Abstract. This report describes the participation of the NLEL Lab. from the Universidad Politécnica of Valencia to the RespubliQA task at CLEF 2010. The system designed for this participation is based on the one used in our previous participation, with some modifications required in order to adapt it to the new guidelines. The system participated to both the “Paragraph Selection”(PS) and “Answer Selection”(AS) subtasks. Keywords: Question Answering, n-gram based Passage Retrieval
Abstract In recent years, geography has acquired a great importance in the context of Information Retrieval (IR) and, in general, of the automated processing of information in text. Mobile devices that are able to surf the web and at the... more
Abstract In recent years, geography has acquired a great importance in the context of Information Retrieval (IR) and, in general, of the automated processing of information in text. Mobile devices that are able to surf the web and at the same time inform about their position are now a common reality, together with applications that can exploit this data to provide users with locally customised information, such as directions or advertisements.
Resumen Este artıculo presenta un método completamente automático que resuelve la desambiguación léxica de nombres calculando la densidad conceptual de cada uno de los sentidos del nombre a desambiguar. La evaluación del método se ha... more
Resumen Este artıculo presenta un método completamente automático que resuelve la desambiguación léxica de nombres calculando la densidad conceptual de cada uno de los sentidos del nombre a desambiguar. La evaluación del método se ha realizado sobre el corpus SemCor con un contexto de sólo dos nombres, obteniendo una precisión de 81.5% y un recall de 60.2%. Palabras clave: desambiguación léxica de nombres, densidad conceptual.
Abstract. This paper investigates the effectiveness of using the redundancy of the web for solving the Word Sense Disambiguation task. The web-based algorithm looks for the adjective-noun pairs in the web to disambiguate an english noun.... more
Abstract. This paper investigates the effectiveness of using the redundancy of the web for solving the Word Sense Disambiguation task. The web-based algorithm looks for the adjective-noun pairs in the web to disambiguate an english noun. Preliminary results show that a better precision than the baseline is obtained but with a low recall. Moreover, the web seems to be more effective than the WordNet Doamains when integrated rather than stand-alone.
Abstract This paper describes the participation of a mixed approach in GeoCLEF-2006. We have participated in Monolingual English Task and we present a joint work of three groups or teams belonging to project R2D2 1 with a new system,... more
Abstract This paper describes the participation of a mixed approach in GeoCLEF-2006. We have participated in Monolingual English Task and we present a joint work of three groups or teams belonging to project R2D2 1 with a new system, mixing the 3 individual systems of the teams.
Abstract Legal texts usually comprise many kinds of texts, such as contracts, patents and treaties. These texts usually include a huge quantity of unstructured information written in natural language. Thanks to automatic analysis and... more
Abstract Legal texts usually comprise many kinds of texts, such as contracts, patents and treaties. These texts usually include a huge quantity of unstructured information written in natural language. Thanks to automatic analysis and Information Retrieval (IR) techniques, it is possible to filter out information that is not relevant and, therefore, to reduce the amount of documents that users need to browse to find the information they are looking for.
Abstract One of the first scenarios imagined by the researchers in Artificial Intelligence was the problem of conversing with a machine in natural language. Alan Turing in 1950 proposed a test in order to check the capability of a machine... more
Abstract One of the first scenarios imagined by the researchers in Artificial Intelligence was the problem of conversing with a machine in natural language. Alan Turing in 1950 proposed a test in order to check the capability of a machine to demonstrate intelligence, and that test, that carries his name, is mostly based on conversation and language understanding. Obtaining responses to questions has always been the ambition of the human being.
Abstract This report describes our approach to the Question Answering-Word Sense Disambiguation task. In our approach, disambiguated documents are used to improve the retrieval phase: this has been implemented by adding a WordNet expanded... more
Abstract This report describes our approach to the Question Answering-Word Sense Disambiguation task. In our approach, disambiguated documents are used to improve the retrieval phase: this has been implemented by adding a WordNet expanded index to the document collection. This index contains synonyms, hypernyms and holonyms of the document words. Question words are searched for in both the expanded WordNet index and the default index.
Abstract In this work we attempted to determine the relative importance of the geographical and WordNet-extracted terms with respect to the remainder of the query. Our system is based on Lucene and uses LingPipe for Named Entity... more
Abstract In this work we attempted to determine the relative importance of the geographical and WordNet-extracted terms with respect to the remainder of the query. Our system is based on Lucene and uses LingPipe for Named Entity recognition. Geographical terms are expanded with WordNet holonyms and synonyms and indexed separately. We checked the relative importance of the terms by boosting them with reduction factors (0.75, 0.5 and 0.25).
This paper describes the work done by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2007 edition of the CLEF Question Answering task. We participated in the... more
This paper describes the work done by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2007 edition of the CLEF Question Answering task. We participated in the Spanish monolingual task only. A series of technical difficulties prevented us from completing all the tasks we subscribed. Our 2006 system was modified in order to comply with the 2007 guidelines, especially with regard to anaphora resolution, tackled with a web based anaphora resolution module.
Abstract This paper presents a simple approach to the Wikipedia Question Answering pilot task in CLEF 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words... more
Abstract This paper presents a simple approach to the Wikipedia Question Answering pilot task in CLEF 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in wikipedia. Our participation was in the monolingual English and Spanish tasks.
Abstract. In this paper we make a first attempt to evaluate the potential of diversity in the Geographical Information Retrieval task. This task represent an opportunity to take advantage of diversity, given that documents are not... more
Abstract. In this paper we make a first attempt to evaluate the potential of diversity in the Geographical Information Retrieval task. This task represent an opportunity to take advantage of diversity, given that documents are not relevant only from a thematic point of view, but also spatially. A user of a GIR system may be interested in results that are geographically distributed and equally relevant.
Abstract. This is a preliminary report of the work carried out in order to introduce “spontaneous” questions into QAST at CLEF 2009. QAST (Question Answering in Speech Transcripts) is a track of the CLEF campaign. The aim of this report... more
Abstract. This is a preliminary report of the work carried out in order to introduce “spontaneous” questions into QAST at CLEF 2009. QAST (Question Answering in Speech Transcripts) is a track of the CLEF campaign. The aim of this report is to show how difficult can be to generate “spontaneous” questions and the importance to take into account the real information needs of users for the evaluation of question answering systems.
Abstract This report describes the work done by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2005 edition of the CLEF Question Answering task. We participated... more
Abstract This report describes the work done by the RFIA group at the Departamento de Sistemas Informáticos y Computación of the Universidad Politécnica of Valencia for the 2005 edition of the CLEF Question Answering task. We participated in three monolingual tasks: Spanish, Italian and French, and in two cross-language tasks: spanish to english and english to spanish.
Abstract. The importance of the analysis of processes related to cognitive phenomena through Natural Language Processing techniques is acquiring a greater relevance every day. Opinion Mining, Sentiment Analysis or Automatic Humour... more
Abstract. The importance of the analysis of processes related to cognitive phenomena through Natural Language Processing techniques is acquiring a greater relevance every day. Opinion Mining, Sentiment Analysis or Automatic Humour Recognition are a sample about how this kind of research works grows. In this paper we focus on the study of how the features that define a corpus of humorous data (one-liners) may be used for obtaining a set of parameters that allow us to build a primitive taxonomy of humour.
Abstract: In this paper we present a technique for the selection of the best translation of a short query among a set of translation obtained from different translators. The technique is based on the calculation of the information entropy... more
Abstract: In this paper we present a technique for the selection of the best translation of a short query among a set of translation obtained from different translators. The technique is based on the calculation of the information entropy of the query with respect to the web. This technique may be used in multilingual applications such as the Cross-Lingual Question Answering. Keywords: Machine Translation, Multilingual Question Answering, Web Mining
Question Answering (QA) can be viewed as a particular form of Information Retrieval (IR), in which the amount of information to return is the minimum required to satisfy the user needs expressed by a specific question such as:" Where is... more
Question Answering (QA) can be viewed as a particular form of Information Retrieval (IR), in which the amount of information to return is the minimum required to satisfy the user needs expressed by a specific question such as:" Where is the Europol Drugs Unit?” 1. A Passage Retrieval (PR) system is an IR system which, given a list of keywords (eg:" Electricity,"" Motor", etc..) or a question such as the previous one, returns fragments of texts (passages) that are relevant to the user needs.
ABSTRACT The importance of analyzing processes related to cognitive phenomena through Natural Language Processing techniques is acquiring a greater relevance every day. Areas such as Opinion Mining, Sentiment Analysis or Automatic Humor... more
ABSTRACT The importance of analyzing processes related to cognitive phenomena through Natural Language Processing techniques is acquiring a greater relevance every day. Areas such as Opinion Mining, Sentiment Analysis or Automatic Humor Recognition are samples about how this kind of research gradually grows. In this paper, we focused on studying the features that define a corpus of humorous data (one-liners) to assess whether they may be used as elements for building a verbal Humor taxonomy.
Abstract. Geographical information is achieving an increasing importance in the World Wide Web. Everyday, the number of users looking for geographically constrained information is growing. Map-based services, such as Google or Yahoo Maps... more
Abstract. Geographical information is achieving an increasing importance in the World Wide Web. Everyday, the number of users looking for geographically constrained information is growing. Map-based services, such as Google or Yahoo Maps provide users with a graphical interface, visualizing results on maps. However, most of the geographical information contained in web documents is represented by means of toponyms, which in many cases are ambiguous.
Abstract Humour is an amazing and challenging topic. Despite the several analyses and researches for understanding its complex mechanisms, it is not completely defined. The studies performed from the Natural Language Processing... more
Abstract Humour is an amazing and challenging topic. Despite the several analyses and researches for understanding its complex mechanisms, it is not completely defined. The studies performed from the Natural Language Processing perspective have demonstrated that, taking into account linguistic resources, statistical methods, machine learning and corpus-based techniques, humour may be handled by means of computational systems in order to automatically generate and recognise it.
Abstract Word sense disambiguation (WSD) is the process of assigning a meaning to a word based on the context in which it occurs. The absence of sense tagged training data is a real problem for the word sense disambiguation task. We... more
Abstract Word sense disambiguation (WSD) is the process of assigning a meaning to a word based on the context in which it occurs. The absence of sense tagged training data is a real problem for the word sense disambiguation task. We present a method for the resolution of lexical ambiguity which relies on the use of the wide-coverage noun taxonomy of WordNet and the notion of conceptual distance among concepts, captured by a conceptual density formula developed for this purpose.
In this paper we present a study carried out over toponyms contained in an Italian news collection, in order to determine the degree of ambiguity of toponyms and how dicult could be to resolve such ambiguities. The results show that... more
In this paper we present a study carried out over toponyms
contained in an Italian news collection, in order to determine
the degree of ambiguity of toponyms and how dicult
could be to resolve such ambiguities. The results show that
frequent toponyms are usually less ambiguous than rare toponyms.
The resolution of ambiguities on a sample of 1; 042
toponyms with di erent features con rms that ambiguous
toponyms are spatially autocorrelated.
In this paper, we present a Question Answering system based on redundancy and a Passage Retrieval method that is specifically oriented to Question Answering. We suppose that in a large enough document collection the answer to a given... more
In this paper, we present a Question Answering system based on redundancy and a Passage Retrieval method that is specifically oriented to Question Answering. We suppose that in a large enough document collection the answer to a given question may appear in several different forms. Therefore, it is possible to find one or more sentences that contain the answer and that also include tokens from the original question. The Passage Retrieval engine is almost language-independent since it is based on n-gram structures. Question classification and answer extraction modules are based on shallow patterns.
Nowadays, a huge quantity of information is stored in digital format. A great portion of this information is constituted by textual and unstructured documents, where geographical references are usually given by means of place names. A... more
Nowadays, a huge quantity of information is stored in digital format. A great portion of this information is constituted by textual and unstructured documents, where geographical references are usually given by means of place names. A common problem with textual information retrieval is represented by polysemous words, that is, words can have more than one sense. This problem is present also in the geographical domain: place names may refer to different locations in the world. In this paper we investigate the use of our word sense disambiguation technique in the geographical domain, with the aim of resolving ambiguous place names. Our technique is based on WordNet conceptual density. Due to the lack of a reference corpus tagged with WordNet senses, we carried out the experiments over a set of 1,210 place names extracted from the SemCor corpus that we named GeoSemCor and made publicly available. We compared our method with the most-frequent baseline and the enhanced-Lesk method, which previously has not been tested in large contexts. The results show that a better precision can be achieved by using a small context (phrase level), whereas a greater coverage can be obtained by using large contexts (document level). The proposed method should be tested with other corpora, due to the fact that our experiments evidenced the excessive bias towards the most-frequent sense of the GeoSemCor.
Toponym Disambiguation, i.e. the task of assigning to place name their correct reference in the world, is getting more attention from many researchers. Many methods have been proposed since now, making use of different resources,... more
Toponym Disambiguation, i.e. the task of assigning to place name their correct reference in the world, is getting more attention from many researchers. Many methods have been proposed since now, making use of different resources, techniques and sense inventories. Unfortunately, a gold standard for the evaluation of those methods is not yet available; therefore, it is difficult to verify the performance of such methods. Recently, a georeferenced version of WordNet has been developed, a resource that can be used to compare methods that are based on geographical data with methods that use textual information. In this paper we carry out a comparison between two of these methods. The results show that the knowledge-based method allowed us to obtain better results with a smaller context size. On the other hand, we observed that the map-based method needs a large context to obtain a good accuracy.
WordNet has been used extensively as a resource for the Word Sense Disambiguation (WSD) task, both as a sense inventory and a repository of semantic relationships. Recently, we investigated the possibility to use it as a resource for the... more
WordNet has been used extensively as a resource for the Word Sense Disambiguation (WSD) task, both as a sense inventory and a
repository of semantic relationships. Recently, we investigated the possibility to use it as a resource for the Geographical Information Retrieval task, more specifically for the toponym disambiguation task, which could be considered a specialization of WSD.We found that it would be very useful to assign to geographical entities inWordNet their coordinates, especially in order to implement geometric shapebased disambiguation methods. This paper presents Geo-WordNet, an automatic annotation of WordNet with geographical coordinates.
The annotation has been carried out by extracting geographical synsets from WordNet, together with their holonyms and hypernyms, and comparing them to the entries in the Wikipedia-World geographical database. A weight was calculated for each of the candidate annotations, on the basis of matches found between the database entries and synset gloss, holonyms and hypernyms. The resulting resource may be used in Geographical Information Retrieval related tasks, especially for toponym disambiguation.
This paper explores a fully automatic knowledge-based method which performs the noun sense disambiguation relying only on the WordNet ontology. The basis of the method is the idea of conceptual density, that is, the correlation between... more
This paper explores a fully automatic knowledge-based method which performs the noun sense disambiguation relying only on the WordNet ontology. The basis of the method is the idea of conceptual density, that is, the correlation between the sense of a given word and its context. A new formula for calculating the conceptual density was proposed and was evaluated on the SemCor corpus.

And 23 more

In this talk (in French) I gave a quick overview of my works since the Ph.D., with the leitmotiv of Ontologies and Information Retrieval. (talk intended to be a presentation in front of my new research group) Dans ce séminaire j'ai donné... more
In this talk (in French) I gave a quick overview of my works since the Ph.D., with the leitmotiv of Ontologies and Information Retrieval.
(talk intended to be a presentation in front of my new research group)

Dans ce séminaire j'ai donné une bref introduction autour de mes travaux dépuis le doctorat, en utilisant les ontologies et la recherche d'information en tant que fil rouge.