International audienceTeambuilder is a prototype of a recommendation engine based on a Big Human ... more International audienceTeambuilder is a prototype of a recommendation engine based on a Big Human Resources Data platform, allowing to securely retrieve the best candidates (resume) for a specific mission
Le corpus est la matière première de la linguistique informatique et du traitement automatique du... more Le corpus est la matière première de la linguistique informatique et du traitement automatique du langage. Peu de langues disposent de corpus riches en ressources web (forums, blogs, etc.), et ce bien qu'elles soient parfois les seules disponibles. Or ces ressources contiennent beaucoup de bruit (menus, publicités, etc.). Le filtrage des données parasites et des répétitions nécessite un nettoyage à grand échelle que les chercheurs font en général à la main.Cette thèse propose un système automatique de constitution de corpus web nettoyés de leur bruit. Il est constitué de trois modules : (a) un module de construction de corpus en n'importe quelle langue et sur tout type de données, prévu pour être collaboratif et historisé ; (b) un module d'aspiration des pages web orienté sur les forums et des blogs ; (c) un module d'extraction de données pertinentes, utilisant des techniques de clustering selon différentes distances à partir de la structure de la page. Le système es...
ACS/IEEE International Conference on Computer Systems and Applications, 2019
The aim of sentiment analysis, known as opinion mining, is to discover subjective information by ... more The aim of sentiment analysis, known as opinion mining, is to discover subjective information by understanding the meaning over public opinions, standpoints and attitudes from the shared text, such as consumers feedback, which focuses on automated tools. In this paper, we introduce a novel model based on a learning approach to enhance the rate of understanding and predicting the sentiment of a text. Our proposed learning approach establishes an effective penalty mechanism to map out the links between the analyzed context in which the sentiment is similar. With a gold standard corpus we released, the results obtained are better in terms of precision, recall, and computation time cost using a multithreading model.
For companies, the need to efficiently deal with vast amounts of integrated multi-source data is ... more For companies, the need to efficiently deal with vast amounts of integrated multi-source data is becoming crucial. Core concerns are 1) proper and flexible human re- sources management approaches, for 2) more effective resource allocation, as well as 3) team staffing. We here propose to address the talent search problem. Our approach is based on professional skills characterization and normalization. In addition, to help in matching between unstructured documents (such as between resumes and job descriptions). To this end, we first provide a complete information technology skills taxonomy, together with a taxonomy managing companies and their sector of activity. This, in order to enhance named entity recognition and normalization. We next design a flexible, scalable and secure architecture integrating multi-source big data, which provides efficient unstructured document analysis and matching. Finally, we evaluate the performance of our platform using real data.
2020 7th International Conference on Internet of Things: Systems, Management and Security (IOTSMS)
The cipher-text policy attribute-based encryption is a promising technique to ensure the security... more The cipher-text policy attribute-based encryption is a promising technique to ensure the security in the third trust parties environment and offers opportunities to their users. However, the policy updating becomes a challenging issue when we use CP-ABE to construct access control schemes. The traditional method consists of presenting a huge work to the data owners, data retrieving, its re-encryption under the new access policy, and the re-sending back to the cloud. These interactions incur a heavy computation burden and a high communication on the data owner. In this paper, we propose a novel approach, in one hand, to enhance the security by using Blockchain technology, and in the other hand to update the access policy dynamically. We use Blockchain to deploy a policy in a manner that preserves security. We use also the cloud to store the data with CP-ABE, and especially, we focus on the delegation of the policy updating method to the cloud. This method can minimize the computation work and avoid the transmission of encrypted data by combining the ciphertext and previous access strategy. Moreover, we also design a policy updating algorithm. In our scheme, the security is occurred by two factors, the first one must satisfy the policy in the CP-ABE. The second one also must satisfy the policy deployed in the Blockchain to have the authorization token generated to access the desired resources.
2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), 2018
This paper presents DyCorC, an extractor and cleaner of web forums contents. Its main points are ... more This paper presents DyCorC, an extractor and cleaner of web forums contents. Its main points are that the process is entirely automatic, language-independent and adaptable to all kinds of forum architectures. The corpus is built accordingly to user queries using expressions or item keywords as in research engines, and then DyCorC minimizes the boilerplate for further feature-based opinion mining and sentiment analysis, gathering comments and scorings. Such noiseless corpora are usually hand made with the help of crawlers and scrapers, with specific containers devised for each type of forum, entailing lots of work and skills. Our aim is to cut down this preprocessing stage. Our algorithm is compared to state of the art models (Apache Nutch, BootCat, JusText), with a gold standard corpus we released. DyCorC offers a better quality of noiseless content extraction. Its algorithm is based on DOM trees with string distances, seven of which have been compared on the reference corpus, and f...
2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC)
Fog computing is a new distributed computing paradigm that extends the cloud to the network edge.... more Fog computing is a new distributed computing paradigm that extends the cloud to the network edge. Fog computing aims at improving quality of service, data access, networking, computation and storage. However, the security and privacy issues persist, even if many cloud solutions were proposed. Indeed, Fog computing introduces new challenges in terms of security and privacy, due to its specific features such as mobility, geo-distribution and heterogeneity etc. Blockchain is an emergent concept bringing efficiency in many fields. In this paper, we propose a new access control scheme based on blockchain technology for the fog computing with fault tolerance in the context of the Internet of Things. Blockchain is used to provide secure management authentication and access process to IoT devices. Each network entity authenticates in the blockchain via the wallet, which allows a secure communication in decentralized environment, hence it achieves the security objectives. In addition, we propose to establish a secure connection between the users and the IoT devices, if their attributes satisfy the policy stored in the blockchain by smart contract. We also address the blockchain transparency problem by the encryption of the users attributes both in the policy and in the request. An authorization token is generated if the encrypted attributes are identical. Moreover, our proposition offers higher scalability, availability and fault tolerance in Fog nodes due to the implementation of load balancing through the Min-Min algorithm.
Recently, containers have been used extensively in the cloud computing field, and several framewo... more Recently, containers have been used extensively in the cloud computing field, and several frameworks have been proposed to schedule containers using a scheduling strategy. The main idea of the different scheduling strategies consist to select the most suitable node, from a set of nodes that forms the cloud platform, to execute each new submitted container. The Spread scheduling strategy, used as the default strategy in the Docker Swarmkit container scheduling framework, consists to select, for each new container, the node with the least number of running containers. In this paper, we propose to improve the Spread strategy by presenting a new container scheduling strategy based on the power consumption of heterogeneous cloud nodes. The novelty of our approach consists to make the best compromise that allows to reduce the global power consumption of an heterogeneous cloud infrastructure. The principle of our strategy is based on learning and scheduling steps which are applied each time a new container is submitted by a user. Our proposed strategy is implemented in Go language inside Docker Swarmkit. Experiments demonstrate the potential of our strategy under different scenarios.
Le corpus est la matiere premiere de la linguistique informatique et du traitement automatique du... more Le corpus est la matiere premiere de la linguistique informatique et du traitement automatique du langage. Peu de langues disposent de corpus riches en ressources web (forums, blogs, etc.), et ce bien qu'elles soient parfois les seules disponibles. Or ces ressources contiennent beaucoup de bruit (menus, publicites, etc.). Le filtrage des donnees parasites et des repetitions necessite un nettoyage a grand echelle que les chercheurs font en general a la main.Cette these propose un systeme automatique de constitution de corpus web nettoyes de leur bruit. Il est constitue de trois modules : (a) un module de construction de corpus en n'importe quelle langue et sur tout type de donnees, prevu pour etre collaboratif et historise ; (b) un module d'aspiration des pages web oriente sur les forums et des blogs ; (c) un module d'extraction de donnees pertinentes, utilisant des techniques de clustering selon differentes distances a partir de la structure de la page. Le systeme es...
International audienceTeambuilder is a prototype of a recommendation engine based on a Big Human ... more International audienceTeambuilder is a prototype of a recommendation engine based on a Big Human Resources Data platform, allowing to securely retrieve the best candidates (resume) for a specific mission
Le corpus est la matière première de la linguistique informatique et du traitement automatique du... more Le corpus est la matière première de la linguistique informatique et du traitement automatique du langage. Peu de langues disposent de corpus riches en ressources web (forums, blogs, etc.), et ce bien qu'elles soient parfois les seules disponibles. Or ces ressources contiennent beaucoup de bruit (menus, publicités, etc.). Le filtrage des données parasites et des répétitions nécessite un nettoyage à grand échelle que les chercheurs font en général à la main.Cette thèse propose un système automatique de constitution de corpus web nettoyés de leur bruit. Il est constitué de trois modules : (a) un module de construction de corpus en n'importe quelle langue et sur tout type de données, prévu pour être collaboratif et historisé ; (b) un module d'aspiration des pages web orienté sur les forums et des blogs ; (c) un module d'extraction de données pertinentes, utilisant des techniques de clustering selon différentes distances à partir de la structure de la page. Le système es...
ACS/IEEE International Conference on Computer Systems and Applications, 2019
The aim of sentiment analysis, known as opinion mining, is to discover subjective information by ... more The aim of sentiment analysis, known as opinion mining, is to discover subjective information by understanding the meaning over public opinions, standpoints and attitudes from the shared text, such as consumers feedback, which focuses on automated tools. In this paper, we introduce a novel model based on a learning approach to enhance the rate of understanding and predicting the sentiment of a text. Our proposed learning approach establishes an effective penalty mechanism to map out the links between the analyzed context in which the sentiment is similar. With a gold standard corpus we released, the results obtained are better in terms of precision, recall, and computation time cost using a multithreading model.
For companies, the need to efficiently deal with vast amounts of integrated multi-source data is ... more For companies, the need to efficiently deal with vast amounts of integrated multi-source data is becoming crucial. Core concerns are 1) proper and flexible human re- sources management approaches, for 2) more effective resource allocation, as well as 3) team staffing. We here propose to address the talent search problem. Our approach is based on professional skills characterization and normalization. In addition, to help in matching between unstructured documents (such as between resumes and job descriptions). To this end, we first provide a complete information technology skills taxonomy, together with a taxonomy managing companies and their sector of activity. This, in order to enhance named entity recognition and normalization. We next design a flexible, scalable and secure architecture integrating multi-source big data, which provides efficient unstructured document analysis and matching. Finally, we evaluate the performance of our platform using real data.
2020 7th International Conference on Internet of Things: Systems, Management and Security (IOTSMS)
The cipher-text policy attribute-based encryption is a promising technique to ensure the security... more The cipher-text policy attribute-based encryption is a promising technique to ensure the security in the third trust parties environment and offers opportunities to their users. However, the policy updating becomes a challenging issue when we use CP-ABE to construct access control schemes. The traditional method consists of presenting a huge work to the data owners, data retrieving, its re-encryption under the new access policy, and the re-sending back to the cloud. These interactions incur a heavy computation burden and a high communication on the data owner. In this paper, we propose a novel approach, in one hand, to enhance the security by using Blockchain technology, and in the other hand to update the access policy dynamically. We use Blockchain to deploy a policy in a manner that preserves security. We use also the cloud to store the data with CP-ABE, and especially, we focus on the delegation of the policy updating method to the cloud. This method can minimize the computation work and avoid the transmission of encrypted data by combining the ciphertext and previous access strategy. Moreover, we also design a policy updating algorithm. In our scheme, the security is occurred by two factors, the first one must satisfy the policy in the CP-ABE. The second one also must satisfy the policy deployed in the Blockchain to have the authorization token generated to access the desired resources.
2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), 2018
This paper presents DyCorC, an extractor and cleaner of web forums contents. Its main points are ... more This paper presents DyCorC, an extractor and cleaner of web forums contents. Its main points are that the process is entirely automatic, language-independent and adaptable to all kinds of forum architectures. The corpus is built accordingly to user queries using expressions or item keywords as in research engines, and then DyCorC minimizes the boilerplate for further feature-based opinion mining and sentiment analysis, gathering comments and scorings. Such noiseless corpora are usually hand made with the help of crawlers and scrapers, with specific containers devised for each type of forum, entailing lots of work and skills. Our aim is to cut down this preprocessing stage. Our algorithm is compared to state of the art models (Apache Nutch, BootCat, JusText), with a gold standard corpus we released. DyCorC offers a better quality of noiseless content extraction. Its algorithm is based on DOM trees with string distances, seven of which have been compared on the reference corpus, and f...
2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC)
Fog computing is a new distributed computing paradigm that extends the cloud to the network edge.... more Fog computing is a new distributed computing paradigm that extends the cloud to the network edge. Fog computing aims at improving quality of service, data access, networking, computation and storage. However, the security and privacy issues persist, even if many cloud solutions were proposed. Indeed, Fog computing introduces new challenges in terms of security and privacy, due to its specific features such as mobility, geo-distribution and heterogeneity etc. Blockchain is an emergent concept bringing efficiency in many fields. In this paper, we propose a new access control scheme based on blockchain technology for the fog computing with fault tolerance in the context of the Internet of Things. Blockchain is used to provide secure management authentication and access process to IoT devices. Each network entity authenticates in the blockchain via the wallet, which allows a secure communication in decentralized environment, hence it achieves the security objectives. In addition, we propose to establish a secure connection between the users and the IoT devices, if their attributes satisfy the policy stored in the blockchain by smart contract. We also address the blockchain transparency problem by the encryption of the users attributes both in the policy and in the request. An authorization token is generated if the encrypted attributes are identical. Moreover, our proposition offers higher scalability, availability and fault tolerance in Fog nodes due to the implementation of load balancing through the Min-Min algorithm.
Recently, containers have been used extensively in the cloud computing field, and several framewo... more Recently, containers have been used extensively in the cloud computing field, and several frameworks have been proposed to schedule containers using a scheduling strategy. The main idea of the different scheduling strategies consist to select the most suitable node, from a set of nodes that forms the cloud platform, to execute each new submitted container. The Spread scheduling strategy, used as the default strategy in the Docker Swarmkit container scheduling framework, consists to select, for each new container, the node with the least number of running containers. In this paper, we propose to improve the Spread strategy by presenting a new container scheduling strategy based on the power consumption of heterogeneous cloud nodes. The novelty of our approach consists to make the best compromise that allows to reduce the global power consumption of an heterogeneous cloud infrastructure. The principle of our strategy is based on learning and scheduling steps which are applied each time a new container is submitted by a user. Our proposed strategy is implemented in Go language inside Docker Swarmkit. Experiments demonstrate the potential of our strategy under different scenarios.
Le corpus est la matiere premiere de la linguistique informatique et du traitement automatique du... more Le corpus est la matiere premiere de la linguistique informatique et du traitement automatique du langage. Peu de langues disposent de corpus riches en ressources web (forums, blogs, etc.), et ce bien qu'elles soient parfois les seules disponibles. Or ces ressources contiennent beaucoup de bruit (menus, publicites, etc.). Le filtrage des donnees parasites et des repetitions necessite un nettoyage a grand echelle que les chercheurs font en general a la main.Cette these propose un systeme automatique de constitution de corpus web nettoyes de leur bruit. Il est constitue de trois modules : (a) un module de construction de corpus en n'importe quelle langue et sur tout type de donnees, prevu pour etre collaboratif et historise ; (b) un module d'aspiration des pages web oriente sur les forums et des blogs ; (c) un module d'extraction de donnees pertinentes, utilisant des techniques de clustering selon differentes distances a partir de la structure de la page. Le systeme es...
Uploads
Computational linguistics by Otman Manad
Papers by Otman Manad