Hanan Ayad

Publication Date: Jun 14, 2012

Publication Date: 2004

Research Interests:
Multiple Classifier Systems, Probability Distribution & Applications, Conditional probability, MCS, and Probabilistic Model

In this paper, we present a multiple data clusterings combiner, based on a proposed Weighted Shared nearest neighbors Graph. (WSnnG). While combining of multiple classifiers (supervised learners) is now an active and mature area, only a... more

In this paper, we present a multiple data clusterings combiner, based on a proposed Weighted Shared nearest neighbors Graph. (WSnnG). While combining of multiple classifiers (supervised learners) is now an active and mature area, only a limited number of contemporary research in combining multiple data clusterings (un-supervised learners) appear in the literature. The problem addressed in this paper is that of generating a reliable clustering to represent the natural cluster structure in a set of patterns, when a number of different clusterings of the data is available or can be generated. The underlying model of the proposed shared nearest neighbors based combiner is a weighted graph, whose vertices correspond to the set of patterns, and are assigned relative weights based on a ratio of a balancing factor to the size of their shared nearest neighbors population. The edges in the graph exist only between patterns that share a pre-specified portion of their nearest neighborhood. The graph can be further partitioned into a desired number of clusters. Preliminary experiments show promising results, and comparison with a recent study justifies the combiner’s suitability to the pre-defined problem domain.

Publication Date: 2003

Research Interests:
Multiple Classifier Systems, Nearest Neighbor, and Data Clustering

Cluster analysis is an un-supervised learning technique that is widely used in the process of topic discovery from text. The research presented here proposes a novel un-supervised learning approach based on aggregation of clusterings... more

Cluster analysis is an un-supervised learning technique that is widely used in the process of topic discovery from text. The research presented here proposes a novel un-supervised learning approach based on aggregation of clusterings produced by different clustering techniques. By examining and combining two different clusterings of a document collection, the aggregation aims at revealing a better structure of the data rather than imposing one that is imposed or constrained by the clustering method itself. When clusters of documents are formed, a process called topic extraction picks terms from the feature space (i.e. the vocabulary of the whole collection) to describe the topic of each cluster. It is proposed at this stage to re-compute terms weights according to the revealed cluster structure. The work further investigates the adaptive setup of the parameters required for the clustering and aggregation techniques. Finally, a topic accuracy measure is developed and used along with the F-measure to evaluate and compare the extracted topics and the clustering quality (respectively) before and after the aggregation. Experimental evaluation shows that the aggregation can successfully improve the clustering quality and the topic accuracy over individual clustering techniques.

Publication Date: 2002

Research Interests:
Artificial Intelligence, Pattern Recognition, Neural Network, Data Structure, Supervised Learning, and 7 moreClassification, Unsupervised Learning, Cluster Analysis, Feature Extraction, Experimental Evaluation, Clustering Method, and Feature Space

Publication Date: 2005

Research Interests:
Multiple Classifier Systems, Bit Error Rate, K means algorithm, and Cumulant

Download (.pdf)

We recently introduced the idea of solving cluster ensembles using a Weighted Shared nearest neighbors Graph (WSnnG). Preliminary experiments have shown promising results in terms of integrating different clusterings into a combined one,... more

We recently introduced the idea of solving cluster ensembles using a Weighted Shared nearest neighbors Graph (WSnnG). Preliminary experiments have shown promising results in terms of integrating different clusterings into a combined one, such that the natural cluster structure of the data can be revealed. In this paper, we further study and extend the basic WSnnG. First, we introduce the use of fixed number of nearest neighbors in order to reduce the size of the graph. Second, we use refined weights on the edges and vertices of the graph. Experiments show that it is possible to capture the similarity relationships between the data patterns on a compact refined graph. Furthermore, the quality of the combined clustering based on the proposed WSnnG surpasses the average quality of the ensemble and that of an alternative clustering combining method based on partitioning of the patterns’ co-association matrix.

Publication Date: 2003

Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In... more

Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with a random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves the maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.

Publication Date: 2008

Publication Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

Research Interests:
Information Systems, Algorithms, Artificial Intelligence, Computational Complexity, Ensemble Methods, and 10 moreProbability Distribution & Applications, Computer Simulation, Cluster Analysis, Empirical Study, Information Content, Consensus clustering, Discrete random variable, Electrical And Electronic Engineering, Cumulant, and Random Numbers

Download (.pdf)

Publication Date: 2010

Publication Name: Pattern Recognition

Research Interests:
Information Theory, Pattern Recognition, Algorithm, Clustering, Vote, and 8 moreRegression Analysis, Accuracy, Empirical evidence, Precision, Linear Regression, Electrical And Electronic Engineering, Cumulant, and Bipartite Matching

Download (.pdf)

Publication Date: 2012

Download (.pdf)

Publication Date: Jun 14, 2012

Publication Date: 2004

Research Interests: Multiple Classifier Systems, Probability Distribution & Applications, Conditional probability, MCS, and Probabilistic Model<div>()</div>

Publication Date: 2003

Research Interests: Multiple Classifier Systems, Nearest Neighbor, and Data Clustering<div>()</div>

Publication Date: 2002

Publication Date: 2005

Research Interests: Multiple Classifier Systems, Bit Error Rate, K means algorithm, and Cumulant<div>()</div>

Publication Date: 2003

Publication Date: 2008

Publication Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

Publication Date: 2010

Publication Name: Pattern Recognition

Publication Date: 2012

Log In

Research Interests:
Multiple Classifier Systems, Probability Distribution & Applications, Conditional probability, MCS, and Probabilistic Model

Research Interests:
Multiple Classifier Systems, Nearest Neighbor, and Data Clustering

Research Interests:
Multiple Classifier Systems, Bit Error Rate, K means algorithm, and Cumulant