Crime analysis and prevention is a systematic approach for identifying and analyzing patterns and trends in crime. Our system can predict regions which have high probability for crime occurrence and can visualize crime prone areas. With... more
Crime analysis and prevention is a systematic approach for identifying and analyzing patterns and trends in crime. Our system can predict regions which have high probability for crime occurrence and can visualize crime prone areas. With the increasing advent of computerized systems, crime data analysts can help the Law enforcement officers to speed up the process of solving crimes. About 10% of the criminals commit about 50% of the crimes. Even though we cannot predict who all may be the victims of crime but can predict the place that has probability for its occurrence. K-means algorithm is done by partitioning data into groups based on their means. K-means algorithm has an extension called expectation maximization algorithm where we partition the data based on their parameters. This easy to implement data mining framework works with the geospatial plot of crime and helps to improve the productivity of the detectives and other law enforcement officers. This system can also be used f...
Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high... more
Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high risk and to help involved decision makers to draw hypotheses about the cause of disease. Egypt is declared as one of the countries having the highest prevalence rate of HCV worldwide. The anomaly of the HCV infection's distribution in Egypt allowed several researches to identify the reasons that contributed to such widespread of HCV in this country. One way that can help in identification of areas with highest diseases is to give a detailed knowledge about the geographical distribution of HCV in Egypt. To achieve that goal, Data mining analytical tools integrated with GIS can help to visualize the distribution. Thus, the main propose of this paper is to present a spatial distribution of HCV in Egypt using case data obtained from the Egyptian health institute National Hepatology Tropical Medicine Research Institute (NHTMR). The visualization of the spatial analysis distribution by means of GIS allows us to investigate statistical results that are easily interpreted by non-experts.
We discuss types of clustering problems where error information associated with the data to be clustered is readily available and where error-based clustering is likely to be superior to clustering methods that ignore error. We focus on... more
We discuss types of clustering problems where error information associated with the data to be clustered is readily available and where error-based clustering is likely to be superior to clustering methods that ignore error. We focus on clustering derived data (typically parameter ...
The growing demand for link bandwidth and node capacity is a frequent phenomenon in IP network backbones. Within this context, traffic prediction is essential for the network operator. Traffic prediction can be undertaken based on link... more
The growing demand for link bandwidth and node capacity is a frequent phenomenon in IP network backbones. Within this context, traffic prediction is essential for the network operator. Traffic prediction can be undertaken based on link traffic or on origin-destination (OD) traffic which presents better results. This work investigates a methodology for traffic prediction based on multidimensional OD traffic, focusing on the stage of short-term traffic prediction using Principal Components Analysis as a technique for dimensionality reduction and a Local Linear Model based on K-means as a technique for prediction and trend analysis. The results validated with data on a real network present a satisfactory margin of error for use in practical situations.
An important task in text mining is finding the dominant topics (and their associated documents) in a collection of documents. While traditional clustering techniques, e.g., hierarchical clustering and K-means, are often used for this... more
An important task in text mining is finding the dominant topics (and their associated documents) in a collection of documents. While traditional clustering techniques, e.g., hierarchical clustering and K-means, are often used for this task, this paper explores a new clustering algorithm which is based on a shared nearest neighbor approach. Unlike traditional clustering algorithms, not all the data is
The smart meter offered exceptional chances to well comprehend energy consumption manners in which quantity of data being generated. One request was the separation of energy load-profiles into clusters of related conduct. The Research... more
The smart meter offered exceptional chances to well comprehend energy consumption manners in which quantity of data being generated. One request was the separation of energy load-profiles into clusters of related conduct. The Research measured the resemblance between groups them together and load-profiles into clusters by k-means clustering algorithm. The cluster met, also called “Gender (Male/Female), House (Rented/Owned) and customers status (Satisfied/Unsatisfied)” display methods of consuming energy. It provided value information aimed at utilities to generate specific electricity charges and healthier aim energy efficiency programs. The results show that 43% extremely dissatisfied of energy customer is achieved by using energy consumption.
The goal of the 2002-2003 Sandia National Laboratories Computer Science Clinic Project was,to create a tool for simultaneous,visualization of sev- eral different reductions,of multi-dimensional data sets and their analy- sis. Analysis was... more
The goal of the 2002-2003 Sandia National Laboratories Computer Science Clinic Project was,to create a tool for simultaneous,visualization of sev- eral different reductions,of multi-dimensional data sets and their analy- sis. Analysis was done,by implementing,manual,clustering and several au- tomatic clustering algorithms including k-means, linkages, and DBSCAN density. Validity metrics were,implemented,to quantitatively compare,dif- ferent clusterings of the same data, assess the
Two-mode partitioning is a relatively new form of clustering that clusters both rows and columns of a data matrix. In this paper, we consider deterministic two-mode partitioning methods in which a criterion similar to k-means is... more
Two-mode partitioning is a relatively new form of clustering that clusters both rows and columns of a data matrix. In this paper, we consider deterministic two-mode partitioning methods in which a criterion similar to k-means is optimized. A variety of optimization methods have been proposed for this type of problem. However, it is still unclear which method should be used, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode partitioning. Several known methods are discussed, and a new fuzzy steps method is introduced. The fuzzy steps method is based on the fuzzy c-means algorithm of Bezdek (1981) and the fuzzy steps approach of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performances of all methods are compared in a large simulation study. In our simulations, a two-mode k-means optimization method most often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode partitioning.
Clustering is a division of data into groups of similar objects. K-means has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version... more
Clustering is a division of data into groups of similar objects. K-means has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version is implemented based on the inherent parallelism during the Distance Calculation and Centroid Update phases. The parallel K-means algorithm is designed in such a way that each P participating node is responsible for handling n/P data points. We run the program on a Linux Cluster with a maximum of eight nodes using message-passing programming model. We examined the performance based on the percentage of correct answers and its speed-up performance. The outcome shows that our parallel K-means program performs relatively well on large datasets.
This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture... more
This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application. In the training stage, after extracting the keypoints for every training image using the scale invariance feature transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multiclass SVM to build the training classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using our algorithm, then, the keypoints are extracted for every small image that contains the detected hand gesture only and fed into the cluster model to map them into a bag-of-words vector, which is finally fed into the multiclass SVM training classifier to recognize the hand gesture.
Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together... more
Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new
Crime analysis and prevention is a systematic approach for identifying and analyzing patterns and trends in crime. Our system can predict regions which have high probability for crime occurrence and can visualize crime prone areas. With... more
Crime analysis and prevention is a systematic approach for identifying and analyzing patterns and trends in crime. Our system can predict regions which have high probability for crime occurrence and can visualize crime prone areas. With the increasing advent of computerized systems, crime data analysts can help the Law enforcement officers to speed up the process of solving crimes. About 10% of the criminals commit about 50% of the crimes. Even though we cannot predict who all may be the victims of crime but can predict the place that has probability for its occurrence. K-means algorithm is done by partitioning data into groups based on their means. K-means algorithm has an extension called expectation maximization algorithm where we partition the data based on their parameters. This easy to implement data mining framework works with the geospatial plot of crime and helps to improve the productivity of the detectives and other law enforcement officers. This system can also be used f...
Bu calismada, veri madenciliginde guncel kumeleme algoritmalarindan DBSCAN, OPTICS ile gecmisi daha eskilere dayanan K-means algoritmasi karsilastirilmistir. Karsilastirma sentetik veritabani uzerinde gosterdikleri kume bulma... more
Bu calismada, veri madenciliginde guncel kumeleme algoritmalarindan DBSCAN, OPTICS ile gecmisi daha eskilere dayanan K-means algoritmasi karsilastirilmistir. Karsilastirma sentetik veritabani uzerinde gosterdikleri kume bulma performanslari degerlendirilerek yapilmistir. Sonucta, yakin zamanda literature giren DBSCAN ve OPTICS algoritmalarinin K-means algoritmasindan daha ustun kume olusturma ozelliklerine sahip oldugu tespit edilmistir.
This work is part of a large research project entitled "Oreillodule" aimed at developing tools for automatic speech recognition, translation, and synthesis for Arabic language. Our attention has mainly been focused on an attempt... more
This work is part of a large research project entitled "Oreillodule" aimed at developing tools for automatic speech recognition, translation, and synthesis for Arabic language. Our attention has mainly been focused on an attempt to present the semantic analyzer developed for the automatic comprehension of the standard spontaneous arabic speech. The findings on the effectiveness of the semantic decoder are
We show that adaptively sampled O(k) centers give a constant factor bi-criteria approximation for the k-means problem, with a constant probability. Moreover, these O(k) centers contain a subset of k centers which give a constant factor... more
We show that adaptively sampled O(k) centers give a constant factor bi-criteria approximation for the k-means problem, with a constant probability. Moreover, these O(k) centers contain a subset of k centers which give a constant factor approximation, and can be found using LP-based techniques of Jain and Vazirani [JV01] and Charikar et al. [CGTS02]. Both these algorithms run in effectively O(nkd) time and extend the O(logk)-approximation achieved by the k-means++ algorithm of Arthur and Vassilvitskii [AV07].
Data clustering is a technique for clustering set of objects into known number of groups. Several approaches are widely applied to data clustering so that objects within the clusters are similar and objects in different clusters are far... more
Data clustering is a technique for clustering set of objects into known number of groups. Several approaches are widely applied to data clustering so that objects within the clusters are similar and objects in different clusters are far away from each other. K-Means, is one of the familiar center based clustering
algorithms since implementation is very easy and fast convergence. However, K-Means algorithm suffers
from initialization, hence trapped in local optima. Flower Pollination Algorithm (FPA) is the global
optimization technique, which avoids trapping in local optimum solution. In this paper, a novel hybrid data clustering approach using Flower Pollination Algorithm and K-Means (FPAKM) is proposed. The proposed algorithm results are compared with K-Means and FPA on eight datasets. From the experimental
results, FPAKM is better than FPA and K-Means.
The k-means clustering algorithm is the oldest and most known method in cluster analysis. It has been widely studied with various extensions and applied in a variety of substantive areas. Since internet, social network, and big data grow... more
The k-means clustering algorithm is the oldest and most known method in cluster analysis. It has been widely studied with various extensions and applied in a variety of substantive areas. Since internet, social network, and big data grow rapidly, multi-view data become more important. For analyzing multi-view data, various multi-view k-means clustering algorithms have been studied. However, most of multi-view k-means clustering algorithms in the literature cannot give feature reduction during clustering procedures. In general, there often exist irrelevant feature components in multi-view data sets that may cause bad performance for these clustering algorithms. There also exists high feature dimension in multi-view data sets so it is necessary to consider reducing its dimension for clustering algorithms. In this paper, a learning mechanism for the multi-view k-means algorithm to automatically compute individual feature weight is constructed. It can reduce these irrelevant feature components in each view. A new multi-view k-means objective function is firstly proposed for constructing the learning mechanism for feature weights in multi-view clustering. A schema for eliminating irrelevant feature(s) with small weight(s) is then considered for feature reduction. Therefore, a new type of multi-view k-means, called a feature-reduction multi-view k-means (FRMVK), is proposed. The computational complexity of FRMVK is also analyzed. Numerical and real data sets are used to compare FRMVK with other feature-weighted multi-view k-means algorithms. Experimental results and comparisons actually demonstrate the effectiveness and usefulness of the proposed FRMVK clustering algorithm.