Many efforts have been deployed by the IR community to extend freetext query processing toward se... more Many efforts have been deployed by the IR community to extend freetext query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow...
In accordance with H2020 guidelines related to data management, every participant projects to the... more In accordance with H2020 guidelines related to data management, every participant projects to the pilot action on open access to research data should develop a Data Management Plan (DMP) in which they specify what data will be open. According to these guidelines, a DMP should detail: What data the project will collect and generate; Whether, and how, this data will be exploited or shared and made accessible/open for verification and re-use; How this data will be curated and preserved. In that frame, this document constitutes the Data Management Plan and strategy for the open data treatment within the HIT2GAP project. The document also specifies the data that are maintained confidential and explains why. It should be noted that this document belongs to the Work Package number 1 of the HIT2GAP project which is responsible for collecting the requirements. The overall requirements of the project are directly related to the collected data in the project that will be used by data processin...
2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), 2016
HAL (Le Centre pour la Communication Scientifique Directe), Nov 8, 2010
XML data flow has reached beyond the world of computer science and has spread to other areas such... more XML data flow has reached beyond the world of computer science and has spread to other areas such as data communication, e-commerce and instant messaging. Therefore, manipulating this data by non expert programmers is becoming imperative. On one hand, Mashups have emerged a few years ago, providing users with visual tools for web data manipulation but not necessarily XML specific. Mashups have been leaning towards functional composition but no formal languages have yet been defined. On the other hand, visual languages for XML have been emerging since the standardization of XML, and mostly relying on querying XML data for extraction or structure transformations. These languages are mainly based on existing textual XML languages, have limited expressiveness and do not provide non expert programmers with means to manipulate XML data. In this paper, we define a generic visual language called XCDL based on Colored Petri Nets allowing non expert programmers to compose manipulation operations. The language is adapted to XML, providing users with means to compose XML oriented operations. The language core syntax is presented here along with an implemented prototype based on it.
Proceedings of the 13th International Conference on Management of Digital EcoSystems, 2021
Many efforts have been deployed by the IR community to extend free-text query processing toward s... more Many efforts have been deployed by the IR community to extend free-text query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe the building blocks of a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow almost linear computation efficiency. Consequently, the semantically augmented XML data tree is processed for structural node clustering, based on semantic query concepts (i.e., key-concepts), in order to identify and rank candidate answer sub-trees containing related occurrences of query key-concepts. Preliminary experiments highlight the quality and potential of our approach.
2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2017
This paper provides an overview on the problem of event-based collective knowledge management fro... more This paper provides an overview on the problem of event-based collective knowledge management from shared multimedia data. We start by introducing key concepts and constructs related to the problem, including multimedia digital ecosystems, collaborative environments, and collective knowledge management. Then, we utilize a real world motivating scenario to highlight some of the major challenges facing event-based knowledge organization in a multimedia collaborative environment, mainly the need to handle: i) heterogeneous data sources and their unstructured content, ii) large and growing volumes of data published online, iii) non-consistent and ambiguous multimedia data annotations, iv) misleading contents (that are not event related) published by non-experienced users, and vi) multimedia data with missing event-related meta-data. Consequently, we provide a short review of existing methods related to event detection from shared social multimedia data on the Web, contrasting their characteristics with respect to the above challenges, before highlighting potential research directions.
XML semantic-aware processing has become a motivating and important challenge in Web data managem... more XML semantic-aware processing has become a motivating and important challenge in Web data management, data processing, and information retrieval. While XML data is semi-structured, yet it remains prone to lexical ambiguity, and thus requires dedicated semantic analysis and sense disambiguation processes to assign well-defined meaning to XML elements and attributes. This becomes crucial in an array of applications ranging over semantic-aware query rewriting, semantic document clustering and classification, schema matching, as well as blog analysis and event detection in social networks and tweets. Most existing approaches in this context: i) ignore the problem of identifying ambiguous XML nodes, ii) only partially consider their structural relations/context, iii) use syntactic information in processing XML data regardless of the semantics involved, and iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Seman...
Computer vision applications such as object detection and recognition, allow machines to visualiz... more Computer vision applications such as object detection and recognition, allow machines to visualize and perceive their environments. Nevertheless, these applications are guided by learning-based methods that require capturing, storing and processing large amounts of images thus rendering privacy and anonymity a major concern. In return, image obfuscation techniques (i.e., pixelating, blurring, and masking) have been developed to protect the sensitive information in images. In this paper, we propose a framework to evaluate and recommend the most robust obfuscation techniques in a specific domain of application. The proposed framework reconstructs obfuscated faces via deep learning-assisted attacks and assesses the reconstructions using structural/identity-based metrics. To evaluate and validate our approach, we conduct our experiments on a publicly available celebrity faces dataset. The obfuscation techniques considered are pixelating, blurring and masking. We evaluate the faces recon...
Dans le domaine du BTP, les projets de construction impliquent l'echange d'un volume impo... more Dans le domaine du BTP, les projets de construction impliquent l'echange d'un volume important d'informations entre divers acteurs ayant des domaines d'expertises et des interets differents. La plupart des donnees echangees au sein de tels projets sont non ou semi-structurees, presentees dans des documents heterogenes (souvent multimedia tels que des plans ou des rapports) et proviennent de sources variees. Bien evidemment, ces documents sont lies les uns aux autres par des liens explicites (p.ex., des references a tout ou partie de documents introduites par l'auteur) ou bien implicites (p.ex., selon les themes abordes dans les documents, tels que la plomberie, l'electricite ou l'isolation thermique du bĂ¢timent). Identifier ce reseau de donnees liees entre documents tout au long de l'evolution d'un projet de construction, de l'indexation jusqu'a la recherche d'information, est aujourd'hui primordial pour faciliter la tĂ¢che d'un...
Computers and the Internet are everywhere nowadays, in every home, domain and field. Communicatio... more Computers and the Internet are everywhere nowadays, in every home, domain and field. Communications between users, applications and heterogeneous information systems are mainly done via XML structured data. XML, based on simple textual data and not requiring any specific platform or environment, has invaded and governed the communication Medias. In the 21stcentury, these communications are now inter-domain and have stepped outside the scope of computer science into other areas (i.e., medical, commerce, social, etc.). As a consequence, and due to the increasing amount of XML data floating between non-expert users (programmers, scientists, etc.), whether on instant messaging, social networks, data storage and others, it is becoming crucial and imperative to allow non-experts to be able to manipulate and control their data (e.g.,parents who want to apply parental control over instant messaging tools in their house, a journalist who wants to gather information from different RSS feeds a...
Abstract Since the emergence of web 2.0, data started floating all over the web, through online a... more Abstract Since the emergence of web 2.0, data started floating all over the web, through online and offline applications, and across all application domains. Web data (semi-structured data loaded through web browsers and applications communicating via internet protocols such as HTTP), in particular XML-based data, is being used for simple commercial information display (i.e., XHTML/HTML in commercial websites), instant messaging (e.g., XMPP for messaging in Whatsapp, Skype, Gtalk etc.), financial transactions (i.e., CDF3 in ecommerce), medical record processing and storage (e.g., HL7 for electronic medical records), social media (e.g., XHTML/HTML in facebook, LinkedIn, Google Plus, etc.), and others. This phenomenon rendered web data manipulation (i.e., monitoring, modifying, controlling, etc.) by IT (information technology) experts, computer technicians and engineers utterly difficult seeing its exponential growth rate in volume and diversity. Not to mention the dynamicity of the data which is continuously changing on the clock and its heterogeneity (e.g., HTML/HTML5, XML, XHTML, RDF, OWL, etc.). Consequently, the manipulation of web data and in particular XML data (since XML has become one of the most essential data types used in computer communications) has shifted from the hands of computer scientists and programmers towards public computer users in all application domains. This has brought a new criterion into the web data manipulation research field, web data manipulation by non-experts. In this paper, we study and analyze existent techniques for manipulating semi-structured web data, particularly XML data, from a non-expert point of view while relating it to traditional manipulation techniques defined in the literature (i.e., filtering, adaptation, data extraction, transformation, access control, encryption, etc.). Web data manipulation techniques by non-experts were categorized under 3 major titles: (i) XML-oriented visual languages dealing with XML data extraction and transformations, (ii) Mashups tackling mainly XML restructuring with value manipulations, and (iii) Dataflow visual programming languages targeting non-expert manipulations and providing means to visually manipulate scientific data. A full analysis was conducted which allowed existent approaches/techniques to be compared and evaluated providing an overview of the current requirements on this subject.
Many efforts have been deployed by the IR community to extend freetext query processing toward se... more Many efforts have been deployed by the IR community to extend freetext query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow...
In accordance with H2020 guidelines related to data management, every participant projects to the... more In accordance with H2020 guidelines related to data management, every participant projects to the pilot action on open access to research data should develop a Data Management Plan (DMP) in which they specify what data will be open. According to these guidelines, a DMP should detail: What data the project will collect and generate; Whether, and how, this data will be exploited or shared and made accessible/open for verification and re-use; How this data will be curated and preserved. In that frame, this document constitutes the Data Management Plan and strategy for the open data treatment within the HIT2GAP project. The document also specifies the data that are maintained confidential and explains why. It should be noted that this document belongs to the Work Package number 1 of the HIT2GAP project which is responsible for collecting the requirements. The overall requirements of the project are directly related to the collected data in the project that will be used by data processin...
2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), 2016
HAL (Le Centre pour la Communication Scientifique Directe), Nov 8, 2010
XML data flow has reached beyond the world of computer science and has spread to other areas such... more XML data flow has reached beyond the world of computer science and has spread to other areas such as data communication, e-commerce and instant messaging. Therefore, manipulating this data by non expert programmers is becoming imperative. On one hand, Mashups have emerged a few years ago, providing users with visual tools for web data manipulation but not necessarily XML specific. Mashups have been leaning towards functional composition but no formal languages have yet been defined. On the other hand, visual languages for XML have been emerging since the standardization of XML, and mostly relying on querying XML data for extraction or structure transformations. These languages are mainly based on existing textual XML languages, have limited expressiveness and do not provide non expert programmers with means to manipulate XML data. In this paper, we define a generic visual language called XCDL based on Colored Petri Nets allowing non expert programmers to compose manipulation operations. The language is adapted to XML, providing users with means to compose XML oriented operations. The language core syntax is presented here along with an implemented prototype based on it.
Proceedings of the 13th International Conference on Management of Digital EcoSystems, 2021
Many efforts have been deployed by the IR community to extend free-text query processing toward s... more Many efforts have been deployed by the IR community to extend free-text query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe the building blocks of a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow almost linear computation efficiency. Consequently, the semantically augmented XML data tree is processed for structural node clustering, based on semantic query concepts (i.e., key-concepts), in order to identify and rank candidate answer sub-trees containing related occurrences of query key-concepts. Preliminary experiments highlight the quality and potential of our approach.
2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2017
This paper provides an overview on the problem of event-based collective knowledge management fro... more This paper provides an overview on the problem of event-based collective knowledge management from shared multimedia data. We start by introducing key concepts and constructs related to the problem, including multimedia digital ecosystems, collaborative environments, and collective knowledge management. Then, we utilize a real world motivating scenario to highlight some of the major challenges facing event-based knowledge organization in a multimedia collaborative environment, mainly the need to handle: i) heterogeneous data sources and their unstructured content, ii) large and growing volumes of data published online, iii) non-consistent and ambiguous multimedia data annotations, iv) misleading contents (that are not event related) published by non-experienced users, and vi) multimedia data with missing event-related meta-data. Consequently, we provide a short review of existing methods related to event detection from shared social multimedia data on the Web, contrasting their characteristics with respect to the above challenges, before highlighting potential research directions.
XML semantic-aware processing has become a motivating and important challenge in Web data managem... more XML semantic-aware processing has become a motivating and important challenge in Web data management, data processing, and information retrieval. While XML data is semi-structured, yet it remains prone to lexical ambiguity, and thus requires dedicated semantic analysis and sense disambiguation processes to assign well-defined meaning to XML elements and attributes. This becomes crucial in an array of applications ranging over semantic-aware query rewriting, semantic document clustering and classification, schema matching, as well as blog analysis and event detection in social networks and tweets. Most existing approaches in this context: i) ignore the problem of identifying ambiguous XML nodes, ii) only partially consider their structural relations/context, iii) use syntactic information in processing XML data regardless of the semantics involved, and iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Seman...
Computer vision applications such as object detection and recognition, allow machines to visualiz... more Computer vision applications such as object detection and recognition, allow machines to visualize and perceive their environments. Nevertheless, these applications are guided by learning-based methods that require capturing, storing and processing large amounts of images thus rendering privacy and anonymity a major concern. In return, image obfuscation techniques (i.e., pixelating, blurring, and masking) have been developed to protect the sensitive information in images. In this paper, we propose a framework to evaluate and recommend the most robust obfuscation techniques in a specific domain of application. The proposed framework reconstructs obfuscated faces via deep learning-assisted attacks and assesses the reconstructions using structural/identity-based metrics. To evaluate and validate our approach, we conduct our experiments on a publicly available celebrity faces dataset. The obfuscation techniques considered are pixelating, blurring and masking. We evaluate the faces recon...
Dans le domaine du BTP, les projets de construction impliquent l'echange d'un volume impo... more Dans le domaine du BTP, les projets de construction impliquent l'echange d'un volume important d'informations entre divers acteurs ayant des domaines d'expertises et des interets differents. La plupart des donnees echangees au sein de tels projets sont non ou semi-structurees, presentees dans des documents heterogenes (souvent multimedia tels que des plans ou des rapports) et proviennent de sources variees. Bien evidemment, ces documents sont lies les uns aux autres par des liens explicites (p.ex., des references a tout ou partie de documents introduites par l'auteur) ou bien implicites (p.ex., selon les themes abordes dans les documents, tels que la plomberie, l'electricite ou l'isolation thermique du bĂ¢timent). Identifier ce reseau de donnees liees entre documents tout au long de l'evolution d'un projet de construction, de l'indexation jusqu'a la recherche d'information, est aujourd'hui primordial pour faciliter la tĂ¢che d'un...
Computers and the Internet are everywhere nowadays, in every home, domain and field. Communicatio... more Computers and the Internet are everywhere nowadays, in every home, domain and field. Communications between users, applications and heterogeneous information systems are mainly done via XML structured data. XML, based on simple textual data and not requiring any specific platform or environment, has invaded and governed the communication Medias. In the 21stcentury, these communications are now inter-domain and have stepped outside the scope of computer science into other areas (i.e., medical, commerce, social, etc.). As a consequence, and due to the increasing amount of XML data floating between non-expert users (programmers, scientists, etc.), whether on instant messaging, social networks, data storage and others, it is becoming crucial and imperative to allow non-experts to be able to manipulate and control their data (e.g.,parents who want to apply parental control over instant messaging tools in their house, a journalist who wants to gather information from different RSS feeds a...
Abstract Since the emergence of web 2.0, data started floating all over the web, through online a... more Abstract Since the emergence of web 2.0, data started floating all over the web, through online and offline applications, and across all application domains. Web data (semi-structured data loaded through web browsers and applications communicating via internet protocols such as HTTP), in particular XML-based data, is being used for simple commercial information display (i.e., XHTML/HTML in commercial websites), instant messaging (e.g., XMPP for messaging in Whatsapp, Skype, Gtalk etc.), financial transactions (i.e., CDF3 in ecommerce), medical record processing and storage (e.g., HL7 for electronic medical records), social media (e.g., XHTML/HTML in facebook, LinkedIn, Google Plus, etc.), and others. This phenomenon rendered web data manipulation (i.e., monitoring, modifying, controlling, etc.) by IT (information technology) experts, computer technicians and engineers utterly difficult seeing its exponential growth rate in volume and diversity. Not to mention the dynamicity of the data which is continuously changing on the clock and its heterogeneity (e.g., HTML/HTML5, XML, XHTML, RDF, OWL, etc.). Consequently, the manipulation of web data and in particular XML data (since XML has become one of the most essential data types used in computer communications) has shifted from the hands of computer scientists and programmers towards public computer users in all application domains. This has brought a new criterion into the web data manipulation research field, web data manipulation by non-experts. In this paper, we study and analyze existent techniques for manipulating semi-structured web data, particularly XML data, from a non-expert point of view while relating it to traditional manipulation techniques defined in the literature (i.e., filtering, adaptation, data extraction, transformation, access control, encryption, etc.). Web data manipulation techniques by non-experts were categorized under 3 major titles: (i) XML-oriented visual languages dealing with XML data extraction and transformations, (ii) Mashups tackling mainly XML restructuring with value manipulations, and (iii) Dataflow visual programming languages targeting non-expert manipulations and providing means to visually manipulate scientific data. A full analysis was conducted which allowed existent approaches/techniques to be compared and evaluated providing an overview of the current requirements on this subject.
Uploads
Papers by Gilbert Tekli