ABSTRACT One of the most modern problems that computer science try to resolve is the plagiarism, in this article we present a new approach for automatic plagiarism detection in world of mail service. Our system is based on the n-gram... more
ABSTRACT One of the most modern problems that computer science try to resolve is the plagiarism, in this article we present a new approach for automatic plagiarism detection in world of mail service. Our system is based on the n-gram character for the representation of the texts and tfidf as weighting to calculate the importance of term in the corpus, we use also a combination between the machine learning methods as a way to detect if a document is plagiarized or not, we use pan 09 corpus for the construction and evaluation of the prediction model then we simulate a meta-heuristic method based on genetic algorithms with a variations of parameters to know if it can improve the results. The main objective of our work is to protect intellectual property and improve the efficiency of plagiarism detection system.
ABSTRACT One of the biggest impediments that prevent the evolution of big data is the privacy of users. Many advanced researches are done within this topic and a lot of concepts had seen the light. One is a cryptographic concept known as... more
ABSTRACT One of the biggest impediments that prevent the evolution of big data is the privacy of users. Many advanced researches are done within this topic and a lot of concepts had seen the light. One is a cryptographic concept known as homomorphic encryption which allows the application of operations on ciphered data without need to decipher it. However, from the cryptographic aspect, the homomorphic encryption has its defects which make it a potentially solution, in fact some researches proved the inefficiency of those cryptosystems against some kind of attacks such as attacks with chosen plaintext (IND-CPA) and attacks with chosen ciphered text (IND-CCA) and even for the majority of homomorphic cryptosystems which use user's identity attacks of chosen identity. On the other, a new type of cryptosystems was recently introduced where he aim is to improve the classic cryptography techniques, such as substitution and transposi-tion using evolutionary methods of data mining, e.g., genetic algorithms. The efficiency of this kind of schemes was proved IND-CPA and IND-CCA. In this paper, we improve the efficiency of a homomorphic cryptosystem known as TSZ (To, Safavi-Naini, and Zhang) by proposing a new approach that combines between it and evolutionary cryptography in order to use the advantages of these two categories.
Nowadays, the concept of big data grows incessantly; recent researches proved that 90% of the whole data existed on the web had been created in last two years. However, this growing bumped by many critical challenges resides generally in... more
Nowadays, the concept of big data grows incessantly; recent researches proved that 90% of the whole data existed on the web had been created in last two years. However, this growing bumped by many critical challenges resides generally in security level; the users care about how could providers protect their privacy on their data. Access control, cryptography, and deidentification are the main search areas grouped under a specific domain known as Privacy Preserving Data Publishing. In this paper, we bring in suggestion a new model for access control over big data using digital signature and confidence interval; we first introduce our work by presenting some general concepts used to build our approach then presenting the idea of this report and finally we evaluate our system by conducting several experiments and showing and discussing the results that we got.
The development of new technologies has led the world into a tipping point. One of these technologies is the big data which made the revolution of computer sciences. Big data has come with new challenges. These challenges can be resumed... more
The development of new technologies has led the world into a tipping point. One of these technologies is the big data which made the revolution of computer sciences. Big data has come with new challenges. These challenges can be resumed in the aim of creating scalable and efficient services that can treat huge amounts of heterogeneous data in small scale of time while preserving users' privacy. Textual data occupy a wide space in internet. These data could contain information that can lead to identify users. For that, the development of such approaches that can detect and remove any identifiable information has become a critical research area known as de-identification. This paper tackle the problem of privacy in textual data. The authors' proposed approach consists of using artificial immune systems and MapReduce to detect and hide identifiable words with no matter on their variants using the personnel information of the user from his profile. After many experiments, the sy...
Despite of its emergence and advantages in various domains, big data still suffers from major disadvantages. Timeless, scalability, and privacy are the main problems that hinder the advance of big data. Privacy preserving has become a... more
Despite of its emergence and advantages in various domains, big data still suffers from major disadvantages. Timeless, scalability, and privacy are the main problems that hinder the advance of big data. Privacy preserving has become a wide search era within the scientific community. This paper covers the problem of privacy preserving over big data by combining both access control and data de-identification techniques in order to provide a powerful system. The aim of this system is to carry on all big data properties (volume, variety, velocity, veracity, and value) to ensure protection of users' identities. After many experiments and tests, our system shows high efficiency on detecting and hiding personal information while maintaining the utility of useful data. The remainder of this report is addressed in the presentation of some known works over a privacy preserving domain, the introduction of some basic concepts that are used to build our approach, the presentation of our syst...
ABSTRACT One of the biggest impediments that prevent the evolution of big data is the privacy of users. Many advanced researches are done within this topic and a lot of concepts had seen the light. One is a cryptographic concept known as... more
ABSTRACT One of the biggest impediments that prevent the evolution of big data is the privacy of users. Many advanced researches are done within this topic and a lot of concepts had seen the light. One is a cryptographic concept known as homomorphic encryption which allows the application of operations on ciphered data without need to decipher it. However, from the cryptographic aspect, the homomorphic encryption has its defects which make it a potentially solution, in fact some researches proved the inefficiency of those cryptosystems against some kind of attacks such as attacks with chosen plaintext (IND-CPA) and attacks with chosen ciphered text (IND-CCA) and even for the majority of homomorphic cryptosystems which use user's identity attacks of chosen identity. On the other, a new type of cryptosystems was recently introduced where he aim is to improve the classic cryptography techniques, such as substitution and transposi-tion using evolutionary methods of data mining, e.g., genetic algorithms. The efficiency of this kind of schemes was proved IND-CPA and IND-CCA. In this paper, we improve the efficiency of a homomorphic cryptosystem known as TSZ (To, Safavi-Naini, and Zhang) by proposing a new approach that combines between it and evolutionary cryptography in order to use the advantages of these two categories.