Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Nacéra Seghouani

    The definition of effective strategies for graph partitioning is a major challenge in distributed environments since an effective graph partitioning allows to considerably improve the performance of large graph data analytics... more
    The definition of effective strategies for graph partitioning is a major challenge in distributed environments since an effective graph partitioning allows to considerably improve the performance of large graph data analytics computations. In this paper, we propose a multi-objective and scalable Balanced GRAph Partitioning (B-GRAP) algorithm to produce balanced graph partitions. B-GRAP is based on Label Propagation (LP) approach and defines different objective functions to deal with either vertex or edge balance constraints while considering edge direction in graphs. The experiments are performed on various graphs while varying the number of partitions. We evaluate B-GRAP using several quality measures and the computation time. The results show that B-GRAP (i) provides a good balance while reducing the cuts between the different computed partitions (ii) reduces the global computation time, compared to Spinner algorithm.
    Several studies have shown that the users of Twitter reveal their interests (i.e., what they like) while they share their opinions, preferences and personal stories.
    Applying Transfer-Learning based on pre-trained language models has become popular in Natural Language Processing. In this paper, we present a weakly supervised Named Entity Recognition system that uses a pre-trained BERT model and... more
    Applying Transfer-Learning based on pre-trained language models has become popular in Natural Language Processing. In this paper, we present a weakly supervised Named Entity Recognition system that uses a pre-trained BERT model and applies two consecutive fine tuning steps. We aim to reduce the amount of human labour required for annotating data by proposing a framework which starts by creating a data set that uses lexicons and pattern recognition on documents. This first noisy data set is used in the first fine tuning step. Then, we apply a second fine tuning step on a small manually refined subset of data. We apply and compare our system with the standard fine tuning BERT approach on large amount of old scanned document. Those documents are North Sea Oil & Gas reports and the knowledge extraction would be used to assess the possibility of future carbon sequestration. Furthermore, we empirically demonstrate the flexibility of our framework showing that it can be applied to entity-identifications in other domains.
    ABSTRACT Ontology mappings provide a common layer which allows distributed applications to share and to exchange semantic information. Providing mechanized ways for mapping ontologies is a challenging issue and main problems to be faced... more
    ABSTRACT Ontology mappings provide a common layer which allows distributed applications to share and to exchange semantic information. Providing mechanized ways for mapping ontologies is a challenging issue and main problems to be faced are related to structural and semantic heterogeneity. The complexity of these problems increases in the presence of spatiotemporal information such as geometry and topological intrinsic characteristics. Our proposal is intended for spatiotemporal ontologies and focuses on providing an integrated access to information sources using local ontologies. Our approach is set to build a system that guides users to derive meaningful mappings and to reason about them. To achieve this we use a description logic extended to spatiotemporal concrete domain. The ontology of each source is normalized in a common extended Ontology Web Language (OWL) which enables a natural correspondence with the spatiotemporal description logic formalism.
    International audienceOntologies provide a common layer that plays a major role in information exchange and support sharing. Ontologies proliferation relies strongly on the automation of their building, integration and deployment... more
    International audienceOntologies provide a common layer that plays a major role in information exchange and support sharing. Ontologies proliferation relies strongly on the automation of their building, integration and deployment processes. In this paper, we present an integrated framework involving complementary dimensions to drive the (semi) automatic acquisition conceptual knowledge process from HTML Web pages. Our approach takes advantage from structural HTML document features and the word location to identify the appropriate term context. Our context definition improves word weighting, the selection of the semantically closer cooccurrents and the relevant extracted ontological concepts. We use an unsupervised clustering method for term groups' generation. Notice that the chosen clustering method relies on a user incremental quality evaluation process. In this paper and after a theoretical presentation of our structural contextual definition, we summarize the most significant results obtained by applying our method on a corpus dedicated to the tourism domain. The first results show how the definition of an appropriate context improves the relevance of the extracted concepts
    National audienceLes ontologies constituent la brique supportant les échanges et le partage des informations en étendant l'interopérabilité syntaxique du web en une interopérabilité sémantique. Le succès du web sémantique dépend du... more
    National audienceLes ontologies constituent la brique supportant les échanges et le partage des informations en étendant l'interopérabilité syntaxique du web en une interopérabilité sémantique. Le succès du web sémantique dépend du degré d'automatisation de la construction des ontologies, de leur déploiement et de leur prolifération. Dans cet article, nous présentons une méthode incrémentale d'extraction de concepts ontologiques à partir de documents HTML en vue de construire une ontologie du domaine. Nous exploitons les caractéristiques structurelles des documents HTML afin de localiser et de définir un contexte approprié pour chaque terme en respectant sa position dans le corpus. Notre définition contextuelle permet de sélectionner les co-occurrents sémantiquement proches et de définir une mesure de pondération appropriée pour chaque couple de termes. Afin d'obtenir des classes de termes, nous avons défini les principes algorithmiques d'une méthode de clustering guidée par le contexte. Notre approche se base sur une évaluation interactive et incrémentale de la qualité des clusters par l'utilisateur. Nous avons expérimenté ces principes algorithmiques sur un corpus du domaine portant sur le tourisme. Les premiers résultats obtenus montrent que la prise en compte du contexte des termes améliore considérablement la pertinence des concepts extraits
    ABSTRACT Social Networking Sites, such as Facebook and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social... more
    ABSTRACT Social Networking Sites, such as Facebook and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social relationships. The key to building a social network is the ability of finding people that we know in real life, which, in turn, requires those people to make publicly available some personal information, such as their names, family names, locations and birth dates, just to name a few. However, it is not uncommon that individuals create multiple pro- files in several social networks, each containing partially overlapping sets of personal information. Matching those different profiles allows to cre- ate a global profile that gives a holistic view of the information of an individual. In this paper, we present an algorithm that uses the network topology and the publicly available personal information to iteratively match profiles across n social networks, based on those individuals who disclose the links to their multiple profiles. The evaluation results, ob- tained on a real dataset composed of around 2 million profiles, show that our algorithm achieves a high accuracy.
    Relation extraction is a difficult open research problem with important applications in several fields such as knowledge management, web mining, ontology building, intelligent systems, etc. In our research, we focus on extracting... more
    Relation extraction is a difficult open research problem with important applications in several fields such as knowledge management, web mining, ontology building, intelligent systems, etc. In our research, we focus on extracting relations among the ontological concepts in order to build a domain ontology. In this paper, firstly, we answer some crucial questions related to the text analyses, the word
    ABSTRACT Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. In order to improve this... more
    ABSTRACT Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. In order to improve this process, we propose an unsupervised hierarchical clustering algorithm namely "contextual ontological concept extraction" (COCE) which is an incremental use of the partitioning algorithm Kmeans and is guided by a structural context. Our context exploits the HTML structure and the location of words to select the semantically closer cooccurrents for each word and to improve the words weighting. Guided by this context definition, we perform an incremental clustering that refines the context of each word clusters to obtain semantically extracted concepts. The COCE algorithm offers the choice between either an automatic execution or a user's interaction. We experiment our algorithm on HTML documents related to the tourism domain. Our results show how the execution of our context-based algorithm which implements an incremental process and a successive refinement of clusters improves their conceptual quality and the relevance of the extracted ontological concepts
    ABSTRACT Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. We propose an unsupervised... more
    ABSTRACT Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. We propose an unsupervised hierarchical clustering algorithm namely “Contextual Ontological Concept Extraction” (COCE) which is an incremental use of a partitioning algorithm and is guided by a structural context. This context exploits the html structure and the location of words to select the semantically closer cooccurrents for each word and to improve the words weighting. Guided by this context definition, we perform an incremental clustering that refines the words’ context of each cluster to obtain semantic extracted concepts. The COCE algorithm offers the choice between either an automatic execution or an interactive one. We experiment the COCE algorithm on French documents related to the tourism. Our results show how the execution of our context-based algorithm improves the relevance of the clusters’ conceptual quality.
    Ontology evaluation is vital for the development and the deployment of many applications like data annotation, retrieval information and semantic Web. In this paper, we expose a survey related to different aspects regarding ontology... more
    Ontology evaluation is vital for the development and the deployment of many applications like data annotation, retrieval information and semantic Web. In this paper, we expose a survey related to different aspects regarding ontology evaluation. Afterwards, we focus on the ontological concept evaluation task. We propose a new evaluation method based on a large collection of web documents and several context definitions deduced from it by applying a linguistic and a documentary analysis. Then based on these two context types, we define an algorithm which compute the credibility degree associated to each word cluster and to each context. Our algorithm titled “Credibility Degree Computation” and noted CDC helps the expert. It gives the possible word associations existing in these contexts, some semantic tags suggestions, deletes the noisy elements or moves them to their appropriate cluster. The CDC algorithm informs about the initial words of a given cluster and facilitates the evaluation task. Our evaluation method, which provides a qualitative and quantitative analysis, does not depend on a gold standard and it could be applied to any domain even if expert intervention is not available. It also permits ontology reuse and evolution since the informing elements which support the experts' interpretation are driven by the web changes and are stored with the experts' comments for later use. Our experiments are conducted on French documents related to the tourism domain. The first obtained results show how our method assist and facilitate the evaluation task for the domain expert.
    The definition of effective strategies for graph partitioning is a major challenge in distributed environments since an effective graph partitioning allows to considerably improve the performance of large graph data analytics... more
    The definition of effective strategies for graph partitioning is a major challenge in distributed environments since an effective graph partitioning allows to considerably improve the performance of large graph data analytics computations. In this paper, we propose a multi-objective and scalable Balanced GRAph Partitioning (\algo) algorithm, based on Label Propagation (LP) approach, to produce balanced graph partitions. \algo defines a new efficient initialization procedure and different objective functions to deal with either vertex or edge balance constraints while considering edge direction in graphs. \algo is implemented of top of the open source distributed graph processing system Giraph. The experiments are performed on various graphs with different structures and sizes (going up to 50.6M vertices and 1.9B edges) while varying the number of partitions. We evaluate \algo using several quality measures and the computation time. The results show that \algo (i) provides a good balance while reducing the cuts between the different computed partitions (ii) reduces the global computation time, compared to LP-based algorithms.
    The World-Wide Web hosts many autonomous and heterogeneous information sources. In the near future each source may be described by its own ontology. The distributed nature of ontology development will lead to a large number of local... more
    The World-Wide Web hosts many autonomous and heterogeneous information sources. In the near future each source may be described by its own ontology. The distributed nature of ontology development will lead to a large number of local ontologies covering overlapping domains. Ontology integration will then become an essential capability for effective interoperability and information sharing. Integration is known to be a hard problem, whose complexity increases particularly in the presence of spatiotemporal information. Space and time entail additional problems such as the heterogeneity of granularity used in representing spatial and temporal features. Spatio-temporal objects possess intrinsic characteristics that make then more complex to handle, and are usually related by specific relationships such as topological, metric and directional relations. The integration process must be enhanced to tackle mappings involving these complex spatiotemporal features. Recently, several tools have been developed to provide support for building mappings. The tools are usually based on heuristic approaches that identify structural and naming similarities [1]. They can be categorized by the type of inputs required for the analysis: descriptions of concepts in OBSERVER [2], concept hierarchies in iPrompt and AnchorPrompt [3] and instances of classes in GLUE [4] and FCA-Merge [5]. However, complex mappings, involving spatiotemporal features, require feedback from a user to further refine proposed mappings and to manually specify mappings not found by the tools.
    La validation probabiliste par simulation guidee a evenements rares est une nouvelle approche de validation de systemes complexes, surs de fonctionnement, securitaires et distribues. Elle repose sur une analyse partielle d'un modele... more
    La validation probabiliste par simulation guidee a evenements rares est une nouvelle approche de validation de systemes complexes, surs de fonctionnement, securitaires et distribues. Elle repose sur une analyse partielle d'un modele du systeme et tente de prouver que la non verification de proprietes de bon fonctionnement, sur une duree d'utilisation donnee, a une probabilite suffisamment faible. Un comportement incorrect (du point de vue des proprietes requises) est un evenement rare dans la vie de ces systemes. Lorsqu'il se produit il est la consequence d'un enchainement d'evenements complexes, inconnus et par consequent difficiles a apprehender. Le systeme a valider est modelise par un reseau de petri stochastique. Une simulation efficace du reseau de petri doit etre capable d'echantillonner des trajectoires (sequences de franchissement) complexes et improbables menant aux marquages d'echec (pour lesquels la propriete a valider n'est pas verifiee). Par consequent deux problemes se posent: dans le premier probleme les sequences de tir des transitions susceptibles de conduire a l'echec (sequences critiques) doivent etre caracterisees a partir d'une analyse structurelle et qualitative du reseau de petri. De la matrice d'incidence du reseau et des specifications des proprietes a verifier, on etablit un systeme d'equations lineaires appele systeme de decision. L'ensemble des solutions de ce systeme comprend tous les vecteurs caracteristiques des sequences critiques. Le second probleme est d'augmenter la probabilite de ces sequences de facon a les parcourir avec une frequence suffisante au cours de l'echantillonnage par simulation. Une methode d'echantillonnage preferentiel est utilisee pour accroitre cette probabilite et pour construire un estimateur correct du niveau de la validation realisee. Un algorithme de simulation est developpe et implemente. Il utilise des procedures qualitatives basees sur differentes strategies de guidage et des procedures quantitatives d'echantillonnage selon differentes approches. Nous montrons l'efficacite de la validation probabiliste par les techniques developpees sur deux applications industrielles
    Abstract. SHIRI 1 is an ontology-based system for integration of semistructured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI... more
    Abstract. SHIRI 1 is an ontology-based system for integration of semistructured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic annotation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental behaviour of Extra...
    Applying Transfer-Learning based on pre-trained language models has become popular in Natural Language Processing. In this paper, we present a weakly supervised Named Entity Recognition system that uses a pre-trained BERT model and... more
    Applying Transfer-Learning based on pre-trained language models has become popular in Natural Language Processing. In this paper, we present a weakly supervised Named Entity Recognition system that uses a pre-trained BERT model and applies two consecutive fine tuning steps. We aim to reduce the amount of human labour required for annotating data by proposing a framework which starts by creating a data set that uses lexicons and pattern recognition on documents. This first noisy data set is used in the first fine tuning step. Then, we apply a second fine tuning step on a small manually refined subset of data. We apply and compare our system with the standard fine tuning BERT approach on large amount of old scanned document. Those documents are North Sea Oil & Gas reports and the knowledge extraction would be used to assess the possibility of future carbon sequestration. Furthermore, we empirically demonstrate the flexibility of our framework showing that it can be applied to entity-identifications in other domains.
    International audienceOntologies provide a common layer that plays a major role in information exchange and support sharing. Ontologies proliferation relies strongly on the automation of their building, integration and deployment... more
    International audienceOntologies provide a common layer that plays a major role in information exchange and support sharing. Ontologies proliferation relies strongly on the automation of their building, integration and deployment processes. In this paper, we present an integrated framework involving complementary dimensions to drive the (semi) automatic acquisition conceptual knowledge process from HTML Web pages. Our approach takes advantage from structural HTML document features and the word location to identify the appropriate term context. Our context definition improves word weighting, the selection of the semantically closer cooccurrents and the relevant extracted ontological concepts. We use an unsupervised clustering method for term groups' generation. Notice that the chosen clustering method relies on a user incremental quality evaluation process. In this paper and after a theoretical presentation of our structural contextual definition, we summarize the most significan...
    Dans cet article, nous presentons une etude preliminaire sur le probleme de caracterisation et de categorisation des interets que les individus renseignent sur les reseaux sociaux, comme reading, jogging, java, etc. Ces interets etant... more
    Dans cet article, nous presentons une etude preliminaire sur le probleme de caracterisation et de categorisation des interets que les individus renseignent sur les reseaux sociaux, comme reading, jogging, java, etc. Ces interets etant exprimes en langage naturel, nous sommes confrontes au probleme de la desambiguisation dans un contexte limite, compte tenu du peu des informations accessibles dans les profils des individus. Les approches visant la desambiguisation des tags dans les folksonomies sont confrontees aux meme probleme, bien qu’elles peuvent s’appuyer sur les ressources faisant l’objet des tags pour avoir un contexte plus riche [GarciaSilva et al. (2012)]. Nous explorons une approche permettant de desambiguiser un interet d’un individu par la determination d’un article Wikipedia qui contient la description de celui-ci. La desambiguisation d’un interet se fera en utilisant les autres interets renseignes par l’individu comme contexte. Les resultats que nous avons obtenus sur ...
    Twitter is a social network that offers a rich and interesting source of information challenging to retrieve and analyze. Twitter data can be accessed using a REST API. The available operations allow to retrieve tweets on the basis of a... more
    Twitter is a social network that offers a rich and interesting source of information challenging to retrieve and analyze. Twitter data can be accessed using a REST API. The available operations allow to retrieve tweets on the basis of a set of keywords but with limitations such as the number of calls per minute and the size of results. Besides, there is no control on retrieved results and finding tweets which are relevant to a specific topic is a big issue. Given these limitations, it is important that the query keywords cover unambiguously the topic of interest in order to both reach the relevant answers and decrease the number of API calls. In this paper we introduce a new crawling algorithm called "Smart Twitter Crawling" (STiC) that retrieves a set of tweets related to a target topic. In this algorithm we take an initial keyword query and we enrich it using a set of additional keywords that come from different data sources. STiC algorithm relies on a DFS search in Twit...
    ABSTRACT Résumé : Grâce au Linked Open Data, les sources RDF mises à disposition sur le web sont de plus en plus nombreuses. Cependant, bien que ces sources soient volu-mineuses et évolutives, elles contiennent relativement peu... more
    ABSTRACT Résumé : Grâce au Linked Open Data, les sources RDF mises à disposition sur le web sont de plus en plus nombreuses. Cependant, bien que ces sources soient volu-mineuses et évolutives, elles contiennent relativement peu d'information par compa-raison au volume d'informations contenues dans les documents semi-structurés. De nombreux outils ont pour objectif d'annoter sémantiquement ces documents mais l'extraction de relations reste une tâche particulièrement difficile quand la structure et le vocabulaire des documents sont hétérogènes. Dans cet article, nous propo-sons une approche permettant d'enrichir et d'interroger une ou plusieurs bases de connaissances RDF/OWL en exploitant un ensemble de documents sémantiquement annotés. Ces bases sont enrichies par des instances de relations incertaines inférées à partir de la structure des documents, des ontologies et des faits présents dans les bases de connaissances. Une requête SPARQL formulée dans le vocabulaire du do-maine est reformulée afin de combiner les faits issus des différentes bases et de trier les réponses en fonction des poids assignés. L'approche a été expérimentée sur des documents HTML et des bases de connaissances issues du Linked Open Data. Les résultats montrent que 63,3% des relations trouvées sont nouvelles avec une précision atteignant 62%.
    The World-Wide Web hosts many autonomous and heterogeneous information sources. In the near future each source may be described by its own ontology. The distributed nature of ontology development will lead to a large number of local... more
    The World-Wide Web hosts many autonomous and heterogeneous information sources. In the near future each source may be described by its own ontology. The distributed nature of ontology development will lead to a large number of local ontologies covering overlapping domains. Ontology integration will then become an essential capability for effective interoperability and information sharing. Integration is known to be a hard problem, whose complexity increases particularly in the presence of spatiotemporal information. Space and time entail additional problems such as the heterogeneity of granularity used in representing spatial and temporal features. Spatio-temporal objects possess intrinsic characteristics that make then more complex to handle, and are usually related by specific relationships such as topological, metric and directional relations. The integration process must be enhanced to tackle mappings involving these complex spatiotemporal features. Recently, several tools have been developed to provide support for building mappings. The tools are usually based on heuristic approaches that identify structural and naming similarities [1]. They can be categorized by the type of inputs required for the analysis: descriptions of concepts in OBSERVER [2], concept hierarchies in iPrompt and AnchorPrompt [3] and instances of classes in GLUE [4] and FCA-Merge [5]. However, complex mappings, involving spatiotemporal features, require feedback from a user to further refine proposed mappings and to manually specify mappings not found by the tools.
    Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. In order to improve this process, we... more
    Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. In order to improve this process, we propose an unsupervised hierarchical clustering algorithm namely "contextual ontological concept extraction" (COCE) which is an incremental use of the partitioning algorithm Kmeans and
    Research Interests:
    ABSTRACT The Linked Open Data initiative brought more and more RDF data sources to be published on the Web. However, these data sources contain relatively little information compared to the documents available on the surface Web. Many... more
    ABSTRACT The Linked Open Data initiative brought more and more RDF data sources to be published on the Web. However, these data sources contain relatively little information compared to the documents available on the surface Web. Many annotation tools have been proposed in the last decade for the automatic construction and enrichment of knowledge bases. But, while noticeable advances are achieved for the extraction of concept instances, the extraction of semantic relations remains a challenging task when the structures and the vocabularies of the target documents are heterogeneous. In this paper, we propose a novel approach, called REISA, which allows to enrich RDF/OWL knowledge bases with semantic relations using semistructured documents annotated with concept instances. REISA produces weighted relation instances without exploiting lexico-syntactic or structure regularities in the documents. Neighbor domain entities in the annotated documents are used to generate the �first sets of candidate relations according to the domain and range axioms de�fined in a domain ontology. The construction of these candidate sets relies on automated semantic controls performed with (i) the existing knowledge bases and (ii) the (inverse) functionality of the target relations. The weighting of the selected relation candidates is performed according to the neighborhood distance between the annotated domain entities in the document. The proposed approach is complementary to classic pattern matching and machine learning approaches and achieves interesting results without exploiting document-level regularities. Experiments on two real web datasets show that (i) REISA allows to extract semantic relationships with interesting precision values reaching 76,5% and that (ii) the weighting method is eff�ective for ranking the relation candidates according to their precision.
    National audienceLes ontologies constituent la brique supportant les échanges et le partage des informations en étendant l'interopérabilité syntaxique du web en une interopérabilité sémantique. Le succès du web sémantique dépend du... more
    National audienceLes ontologies constituent la brique supportant les échanges et le partage des informations en étendant l'interopérabilité syntaxique du web en une interopérabilité sémantique. Le succès du web sémantique dépend du degré d'automatisation de la construction des ontologies, de leur déploiement et de leur prolifération. Dans cet article, nous présentons une méthode incrémentale d'extraction de concepts ontologiques à partir de documents HTML en vue de construire une ontologie du domaine. Nous exploitons les caractéristiques structurelles des documents HTML afin de localiser et de définir un contexte approprié pour chaque terme en respectant sa position dans le corpus. Notre définition contextuelle permet de sélectionner les co-occurrents sémantiquement proches et de définir une mesure de pondération appropriée pour chaque couple de termes. Afin d'obtenir des classes de termes, nous avons défini les principes algorithmiques d'une méthode de clusterin...
    National audienceLes ontologies constituent la brique supportant les échanges et le partage des informations en étendant l'interopérabilité syntaxique du web en une interopérabilité sémantique. Le succès du web sémantique dépend du... more
    National audienceLes ontologies constituent la brique supportant les échanges et le partage des informations en étendant l'interopérabilité syntaxique du web en une interopérabilité sémantique. Le succès du web sémantique dépend du degré d'automatisation de la construction des ontologies, de leur déploiement et de leur prolifération. Dans cet article, nous présentons une méthode incrémentale d'extraction de concepts ontologiques à partir de documents HTML en vue de construire une ontologie du domaine. Nous exploitons les caractéristiques structurelles des documents HTML afin de localiser et de définir un contexte approprié pour chaque terme en respectant sa position dans le corpus. Notre définition contextuelle permet de sélectionner les co-occurrents sémantiquement proches et de définir une mesure de pondération appropriée pour chaque couple de termes. Afin d'obtenir des classes de termes, nous avons défini les principes algorithmiques d'une méthode de clusterin...
    International audienceIn online social networks individuals are given the option to reveal on their online profiles some personal information about themselves including, among others, their home loca- tion that, if specified, is typically... more
    International audienceIn online social networks individuals are given the option to reveal on their online profiles some personal information about themselves including, among others, their home loca- tion that, if specified, is typically referred to with a toponym. A toponym disclosed by an individual on her profile, or self- reported toponym (e.g., “London”), is often ambiguous and may be interpreted as the name of different geographic lo- cations (“London, U.K.”, “London, Ontario, Canada”, “Lon- don, Arkansas, USA”). The determination of the right inter- pretation of a self-reported toponym is challenging because the profile of an individual usually lacks the contextual in- formation that can be used to disambiguate the toponym.In this paper, we propose LocusRank, an algorithm for the disambiguation (or resolution) of the self-reported to- ponyms in online social network profiles. More precisely, Lo- cusRankchooses the right interpretation of a self-reported toponym of an individ...
    International audienceIn online social networks individuals are given the option to reveal on their online profiles some personal information about themselves including, among others, their home loca- tion that, if specified, is typically... more
    International audienceIn online social networks individuals are given the option to reveal on their online profiles some personal information about themselves including, among others, their home loca- tion that, if specified, is typically referred to with a toponym. A toponym disclosed by an individual on her profile, or self- reported toponym (e.g., “London”), is often ambiguous and may be interpreted as the name of different geographic lo- cations (“London, U.K.”, “London, Ontario, Canada”, “Lon- don, Arkansas, USA”). The determination of the right inter- pretation of a self-reported toponym is challenging because the profile of an individual usually lacks the contextual in- formation that can be used to disambiguate the toponym.In this paper, we propose LocusRank, an algorithm for the disambiguation (or resolution) of the self-reported to- ponyms in online social network profiles. More precisely, Lo- cusRankchooses the right interpretation of a self-reported toponym of an individ...
    Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus nombreuses. Cependant, ces sources contiennent relativement peu d'information par comparaison au volume d'informations contenues dans... more
    Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus nombreuses. Cependant, ces sources contiennent relativement peu d'information par comparaison au volume d'informations contenues dans les documents semi-structures. De nombreux outils ont pour objectif d'annoter semantiquement ces documents mais l'extraction de relations reste une tâche particulierement difficile quand la structure et le vocabulaire des documents sont heterogenes. Nous proposons une approche permettant d'enrichir et d'interroger une ou plusieurs bases de connaissances RDF/OWL en exploitant un ensemble de documents semantiquement annotes. Ces bases sont enrichies par des instances de relations incertaines inferees a partir de la structure des documents, des ontologies et des faits presents dans les bases de connaissances. Une requete SPARQL formulee dans le vocabulaire du domaine est reformulee afin de combiner les faits issus des differentes bases e...
    Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus nombreuses. Cependant, ces sources contiennent relativement peu d'information par comparaison au volume d'informations contenues dans... more
    Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus nombreuses. Cependant, ces sources contiennent relativement peu d'information par comparaison au volume d'informations contenues dans les documents semi-structures. De nombreux outils ont pour objectif d'annoter semantiquement ces documents mais l'extraction de relations reste une tâche particulierement difficile quand la structure et le vocabulaire des documents sont heterogenes. Nous proposons une approche permettant d'enrichir et d'interroger une ou plusieurs bases de connaissances RDF/OWL en exploitant un ensemble de documents semantiquement annotes. Ces bases sont enrichies par des instances de relations incertaines inferees a partir de la structure des documents, des ontologies et des faits presents dans les bases de connaissances. Une requete SPARQL formulee dans le vocabulaire du domaine est reformulee afin de combiner les faits issus des differentes bases e...
    Social Networking Sites , such as Twitter and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social relationships.... more
    Social Networking Sites , such as Twitter and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social relationships. The key to building a social network is the ability of finding people that we know in real life, which, in turn, requires those people to make publicly available some personal information, such as their names, family names, locations and birth dates, just to name a few. However, it is not uncommon that individuals create multiple profiles in several social networks, each containing partially overlapping sets of personal information. As a result, the search for an individual might require numerous queries to match the information that is spread across many profiles, unless an efficient way is provided to automatically integrate those profiles to have an holistic view of the information on the individual. This calls for efficient algorithms for the determination (or reconciliation) of the profiles created by an individual across social networks. In this paper, we build on a previous research of ours and we describe LIAISON (reconciLIAtion of Individuals profiles across SOcial Networks), an algorithm that uses the network topology and the publicly available personal information to iteratively reconcile profiles across n social networks, based on the existence of individuals who disclose the links to their multiple profiles. We evaluate LIAISON on real large datasets and we compare it against existing approaches; the results of the evaluation show that LIAISON achieves a high accuracy.
    Social Networking Sites , such as Twitter and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social relationships.... more
    Social Networking Sites , such as Twitter and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social relationships. The key to building a social network is the ability of finding people that we know in real life, which, in turn, requires those people to make publicly available some personal information, such as their names, family names, locations and birth dates, just to name a few. However, it is not uncommon that individuals create multiple profiles in several social networks, each containing partially overlapping sets of personal information. As a result, the search for an individual might require numerous queries to match the information that is spread across many profiles, unless an efficient way is provided to automatically integrate those profiles to have an holistic view of the information on the individual. This calls for efficient algorithms for the determination (or reconciliation) of the profiles created by an individual across social networks. In this paper, we build on a previous research of ours and we describe LIAISON (reconciLIAtion of Individuals profiles across SOcial Networks), an algorithm that uses the network topology and the publicly available personal information to iteratively reconcile profiles across n social networks, based on the existence of individuals who disclose the links to their multiple profiles. We evaluate LIAISON on real large datasets and we compare it against existing approaches; the results of the evaluation show that LIAISON achieves a high accuracy.
    National audienceLes ontologies constituent la brique de base du web sémantique et le succès de celui-ci dépend de leur déploiement et de leur prolifération. Dans ce papier, nous proposons une approche permettant d'automatiser le... more
    National audienceLes ontologies constituent la brique de base du web sémantique et le succès de celui-ci dépend de leur déploiement et de leur prolifération. Dans ce papier, nous proposons une approche permettant d'automatiser le processus de construction d'ontologie à partir de pages Web. Notre approche exploite la structure du document HTML et utilise une méthode de fouilles de données pour découvrir les concepts, pour les structurer sous forme d'une hiérarchie et pour les enrichir. Notre approche permet la construction d'ontologie en allant de la formation du corpus jusqu'à la formalisation des connaissances stockées dans une ontologie
    National audienceLes ontologies constituent la brique de base du web sémantique et le succès de celui-ci dépend de leur déploiement et de leur prolifération. Dans ce papier, nous proposons une approche permettant d'automatiser le... more
    National audienceLes ontologies constituent la brique de base du web sémantique et le succès de celui-ci dépend de leur déploiement et de leur prolifération. Dans ce papier, nous proposons une approche permettant d'automatiser le processus de construction d'ontologie à partir de pages Web. Notre approche exploite la structure du document HTML et utilise une méthode de fouilles de données pour découvrir les concepts, pour les structurer sous forme d'une hiérarchie et pour les enrichir. Notre approche permet la construction d'ontologie en allant de la formation du corpus jusqu'à la formalisation des connaissances stockées dans une ontologie

    And 52 more