Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
In the last few years, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analyzed in order to extract useful hidden knowledge. This work focuses on... more
In the last few years, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analyzed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose a new algorithm, called PRINCE. Its main feature is the construction of a partially ordered structure for extracting subsets of association rules, called generic bases. Without loss of information these subsets form representation of the whole association rule set. To reduce the cost of such a construction, the partially ordered structure is built thanks to the minimal generators associated to frequent closed patterns. The closed ones are simultaneously derived with generic bases thanks to a simple bottom-up traversal of the obtained structure. The experimentations we carried out in benchmark and "worst case" contexts showed the efficiency of the proposed algorithm, compared to algorithms like CLOSE, A-CLOSE and TITANIC.
... {tarek.hamrouni@fst.rnu.tn, hamrouni@cril.univ-artois.fr} ... On the other hand, in many real-life applications like market basket analysis, medical data analysis, social network anal-ysis and bioinformatics, etc., the disjunctive... more
... {tarek.hamrouni@fst.rnu.tn, hamrouni@cril.univ-artois.fr} ... On the other hand, in many real-life applications like market basket analysis, medical data analysis, social network anal-ysis and bioinformatics, etc., the disjunctive connector link-ing items can bring key information as ...
One of the most powerful techniques to study protein structures is to look for recurrent fragments (also called substructures or spatial motifs), then use them as patterns to characterize the proteins under study. An emergent trend... more
One of the most powerful techniques to study protein structures is to look for recurrent fragments (also called substructures or spatial motifs), then use them as patterns to characterize the proteins under study. An emergent trend consists in parsing proteins three-dimensional (3D) structures into graphs of amino acids. Hence, the search of recurrent spatial motifs is formulated as a process of frequent subgraph discovery where each subgraph represents a spatial motif. In this scope, several efficient approaches for frequent subgraph discovery have been proposed in the literature. However, the set of discovered frequent subgraphs is too large to be efficiently analyzed and explored in any further process. In this paper, we propose a novel pattern selection approach that shrinks the large number of discovered frequent subgraphs by selecting the representative ones. Existing pattern selection approaches do not exploit the domain knowledge. Yet, in our approach we incorporate the evolutionary information of amino acids defined in the substitution matrices in order to select the representative subgraphs. We show the effectiveness of our approach on a number of real datasets. The results issued from our experiments show that our approach is able to considerably decrease the number of motifs while enhancing their interestingness.
With the emergence of graph databases, the task of frequent subgraph discovery has been extensively addressed. Although the proposed approaches in the literature have made this task feasible, the number of discovered frequent subgraphs is... more
With the emergence of graph databases, the task of frequent subgraph discovery has been extensively addressed. Although the proposed approaches in the literature have made this task feasible, the number of discovered frequent subgraphs is still very high to be efficiently used in any further exploration. Feature selection for graph data is a way to reduce the high number of frequent subgraphs based on exact or approximate structural similarity. However, current structural similarity strategies are not efficient enough in many real-world applications, besides, the combinatorial nature of graphs makes it computationally very costly. In order to select a smaller yet structurally irredundant set of subgraphs, we propose a novel approach that mines the top-k topological representative subgraphs among the frequent ones. Our approach allows detecting hidden structural similarities that existing approaches are unable to detect such as the density or the diameter of the subgraph. In addition, it can be easily extended using any user defined structural or topological attributes depending on the sought properties. Empirical studies on real and synthetic graph datasets show that our approach is fast and scalable.
ABSTRACT
CiteSeerX - Document Details (Isaac Councill, Lee Giles): APRESW workshop represented a meeting point for individuals working on adaptive, personalized and recommender systems for the Social-semantic Web. The main objectives of this... more
CiteSeerX - Document Details (Isaac Councill, Lee Giles): APRESW workshop represented a meeting point for individuals working on adaptive, personalized and recommender systems for the Social-semantic Web. The main objectives of this meeting were to gather state of the art ...
Research Interests:
Formal Concept Analysis "FCA" is a data analysis method which enables to discover hidden knowledge existing in data. A kind of hidden knowledge extracted from data is association rules. Different quality measures were reported... more
Formal Concept Analysis "FCA" is a data analysis method which enables to discover hidden knowledge existing in data. A kind of hidden knowledge extracted from data is association rules. Different quality measures were reported in the literature to extract only relevant association rules. Given a dataset, the choice of a good quality measure remains a challenging task for a user.
Multistrategy learning (MSL) consists of combining at least two different learning strategies to bring out a powerful system, where the drawbacks of the basic algorithms are avoided. In this scope, instance-based learning (IBL) techniques... more
Multistrategy learning (MSL) consists of combining at least two different learning strategies to bring out a powerful system, where the drawbacks of the basic algorithms are avoided. In this scope, instance-based learning (IBL) techniques are often used as the basic component. However, one of the major drawbacks of IBL is the prototype selection problem which consists in selecting a subset
The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of... more
The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of data. However, the number of generated rules is too large to be efficiently analyzed and explored in any further process. Association rules selection is a classical topic to address this issue, yet, new innovated approaches are required in order to provide help to decision makers. Hence, many interesting- ness measures have been defined to statistically evaluate and filter the association rules. However, these measures present two major problems. On the one hand, they do not allow eliminating irrelevant rules, on the other hand, their abun- dance leads to the heterogeneity of the evaluation results which leads to confusion in decision making. In this paper, we propose a two-winged approach to select statistically in- teresting and semantically incompara...
Durant ces dernières années, l’utilisation de graphes a fait l’objet de nombreux travaux, notamment en bases de données, apprentissage automatique, bioinformatique et en analyse des réseaux sociaux. La fouille de sous-graphes fréquents... more
Durant ces dernières années, l’utilisation de graphes a fait l’objet de nombreux travaux, notamment en bases de données, apprentissage automatique, bioinformatique et en analyse des réseaux sociaux. La fouille de sous-graphes fréquents constitue un défi majeur dans le contexte de très grandes bases de graphes. Dans ce papier, nous présentons une nouvelle approche basée sur le paradigme MapReduce pour la fouille de sous-graphes fréquents à grande échelle. L’approche proposée offre une nouvelle technique de partitionnement qui tient compte des caractéristiques des données et qui améliore le partitionnement par défaut de MapReduce. L’étude des performances de notre approche réalisée en utilisant un nuage privé a montré son efficacité.
Multi-layer neural networks have been successfully applied in a wide range of supervised and unsupervised learning applications. As they often produce incomprehensible models they are not widely used in data mining applications. To avoid... more
Multi-layer neural networks have been successfully applied in a wide range of supervised and unsupervised learning applications. As they often produce incomprehensible models they are not widely used in data mining applications. To avoid such limitations, comprehensive models have been previously introduced making use of an apriori knowl- edge to build the network architecture. They permit to neural network methods
This paper concerns the use of an object-oriented database for the analysis of protein sequences. We describe proteins either by bibliographic information or by prediction function such as Prosite patterns [2, 5]. We propose to use... more
This paper concerns the use of an object-oriented database for the analysis of protein sequences. We describe proteins either by bibliographic information or by prediction function such as Prosite patterns [2, 5]. We propose to use concept lattices---a tool used in information retrieval to build thesauruses---to classify protein sequences. This classification of proteins may help finding sequence alignments, or discussing about them.
This paper describes a new approach to problem solving by splitting up problem component parts between software and hardware. Our main idea arises from the combination of two previously published works. The first one proposed a conceptual... more
This paper describes a new approach to problem solving by splitting up problem component parts between software and hardware. Our main idea arises from the combination of two previously published works. The first one proposed a conceptual environment of concept modelling in which the machine and the human expert interact. The second one reported an algorithm based on reconfigurable hardware system which outperforms any kind of previously published genetic data base scanning hardware or algorithms. Here we show how efficient the interaction between the machine and the expert is when the concept modelling is based on reconfigurable hardware system. Their cooperation is thus achieved with an real time interaction speed. The designed system has been partially applied to the recognition of primate splice junctions sites in genetic sequences.
We propose a cooperative conceptual modelling environment in which two agents interact: the machine and the human expert. The former is able to extract knowledge from data using a symbolic-numeric machine learning system, and the latter... more
We propose a cooperative conceptual modelling environment in which two agents interact: the machine and the human expert. The former is able to extract knowledge from data using a symbolic-numeric machine learning system, and the latter is able to control the learning process by accepting and validating the machine results, or by criticizing those results or the explanation that the system produces on them. The improvement of the conceptual modelling relies on the cooperation between the two agents. Results obtained with our method on prediction of primate splice junctions sites in genetic sequences are far better than those reported in the literature with other symbolic machine learning systems, and are as better as those obtained with some artificial neural networks methods reported at present. But in opposite to neural networks which lack of argumentation, our system provides the user a plausible explanation of its prediction.
Research Interests:
CAP
... 1060 Tunis, Tunisie 62307 Lens Cedex, France {tarek.hamrouni, sadok.benyahia}@fst.rnu.tn mephu@cril.univ-artois.fr ... On the other hand, a dense context has many frequently occurring items and/or strong correlations between several... more
... 1060 Tunis, Tunisie 62307 Lens Cedex, France {tarek.hamrouni, sadok.benyahia}@fst.rnu.tn mephu@cril.univ-artois.fr ... On the other hand, a dense context has many frequently occurring items and/or strong correlations between several items and/or many items in each object. ...

And 65 more