Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

scholarly journals A Learning-Based EM Clustering for Circular Data with Unknown Number of Clusters

2020 ◽  
Vol 15 ◽  
pp. 42-51
Author(s):  
Shou-Jen Chang-Chien ◽  
Wajid Ali ◽  
Miin-Shen Yang

Clustering is a method for analyzing grouped data. Circular data were well used in various applications, such as wind directions, departure directions of migrating birds or animals, etc. The expectation & maximization (EM) algorithm on mixtures of von Mises distributions is popularly used for clustering circular data. In general, the EM algorithm is sensitive to initials and not robust to outliers in which it is also necessary to give a number of clusters a priori. In this paper, we consider a learning-based schema for EM, and then propose a learning-based EM algorithm on mixtures of von Mises distributions for clustering grouped circular data. The proposed clustering method is without any initial and robust to outliers with automatically finding the number of clusters. Some numerical and real data sets are used to compare the proposed algorithm with existing methods. Experimental results and comparisons actually demonstrate these good aspects of effectiveness and superiority of the proposed learning-based EM algorithm.

2018 ◽  
Vol 2018 ◽  
pp. 1-14
Author(s):  
Karim El mokhtari ◽  
Serge Reboul ◽  
Georges Stienne ◽  
Jean Bernard Choquel ◽  
Benaissa Amami ◽  
...  

In this article, we propose a multimodel filter for circular data. The so-called Circular Interacting Multimodel filter is derived in a Bayesian framework with the circular normal von Mises distribution. The aim of the proposed filter is to obtain the same performance in the circular domain as the classical IMM filter in the linear domain. In our approach, the mixing and fusion stages of the Circular Interacting Multimodel filter are, respectively, defined from the a priori and from the a posteriori circular distributions of the state angle knowing the measurements and according to a set of models. We propose in this article a set of circular models that will be used in order to detect the vehicle maneuvers from heading measurements. The Circular Interacting Multimodel filter performances are assessed on synthetic data and we show on real data a vehicle maneuver detection application.


2015 ◽  
Vol 45 (3) ◽  
pp. 729-758 ◽  
Author(s):  
Roel Verbelen ◽  
Lan Gong ◽  
Katrien Antonio ◽  
Andrei Badescu ◽  
Sheldon Lin

AbstractWe discuss how to fit mixtures of Erlangs to censored and truncated data by iteratively using the EM algorithm. Mixtures of Erlangs form a very versatile, yet analytically tractable, class of distributions making them suitable for loss modeling purposes. The effectiveness of the proposed algorithm is demonstrated on simulated data as well as real data sets.


2015 ◽  
Vol 2015 ◽  
pp. 1-13
Author(s):  
Jianwei Ding ◽  
Yingbo Liu ◽  
Li Zhang ◽  
Jianmin Wang

Condition monitoring systems are widely used to monitor the working condition of equipment, generating a vast amount and variety of telemetry data in the process. The main task of surveillance focuses on analyzing these routinely collected telemetry data to help analyze the working condition in the equipment. However, with the rapid increase in the volume of telemetry data, it is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a priori knowledge. In this paper, we proposed a probabilistic generative model called working condition model (WCM), which is capable of simulating the process of event sequence data generated and depicting the working condition of equipment at runtime. With the help of WCM, we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect the working mode of an event sequence (working condition diagnosis). Furthermore, we have applied WCM to illustrative applications like automated detection of an anomalous event sequence for the runtime of equipment. Our experimental results on the real data sets demonstrate the effectiveness of the model.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xiangfei Chen ◽  
David Trafimow ◽  
Tonghui Wang ◽  
Tingting Tong ◽  
Cong Wang

PurposeThe authors derive the necessary mathematics, provide computer simulations, provide links to free and user-friendly computer programs, and analyze real data sets.Design/methodology/approachCohen's d, which indexes the difference in means in standard deviation units, is the most popular effect size measure in the social sciences and economics. Not surprisingly, researchers have developed statistical procedures for estimating sample sizes needed to have a desirable probability of rejecting the null hypothesis given assumed values for Cohen's d, or for estimating sample sizes needed to have a desirable probability of obtaining a confidence interval of a specified width. However, for researchers interested in using the sample Cohen's d to estimate the population value, these are insufficient. Therefore, it would be useful to have a procedure for obtaining sample sizes needed to be confident that the sample. Cohen's d to be obtained is close to the population parameter the researcher wishes to estimate, an expansion of the a priori procedure (APP). The authors derive the necessary mathematics, provide computer simulations and links to free and user-friendly computer programs, and analyze real data sets for illustration of our main results.FindingsIn this paper, the authors answered the following two questions: The precision question: How close do I want my sample Cohen's d to be to the population value? The confidence question: What probability do I want to have of being within the specified distance?Originality/valueTo the best of the authors’ knowledge, this is the first paper for estimating Cohen's effect size, using the APP method. It is convenient for researchers and practitioners to use the online computing packages.


2022 ◽  
Vol 7 (2) ◽  
pp. 1726-1741
Author(s):  
Ahmed Sedky Eldeeb ◽  
◽  
Muhammad Ahsan-ul-Haq ◽  
Mohamed. S. Eliwa ◽  
◽  
...  

<abstract> <p>In this paper, a flexible probability mass function is proposed for modeling count data, especially, asymmetric, and over-dispersed observations. Some of its distributional properties are investigated. It is found that all its statistical and reliability properties can be expressed in explicit forms which makes the proposed model useful in time series and regression analysis. Different estimation approaches including maximum likelihood, moments, least squares, Andersonӳ-Darling, Cramer von-Mises, and maximum product of spacing estimator, are derived to get the best estimator for the real data. The estimation performance of these estimation techniques is assessed via a comprehensive simulation study. The flexibility of the new discrete distribution is assessed using four distinctive real data sets ԣoronavirus-flood peaks-forest fire-Leukemia? Finally, the new probabilistic model can serve as an alternative distribution to other competitive distributions available in the literature for modeling count data.</p> </abstract>


1999 ◽  
Vol 09 (03) ◽  
pp. 195-202 ◽  
Author(s):  
JOSÉ ALFREDO FERREIRA COSTA ◽  
MÁRCIO LUIZ DE ANDRADE NETTO

Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately [Formula: see text] possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are similar. The number of regions and their labels are automatically found and they are related to the number of clusters in a multivariate data set. New data can be classified by labeling it according to the best match neuron. Simulations using data sets drawn from finite mixtures of p-variate normal densities are presented as well as related advantages and drawbacks of the method.


2017 ◽  
Vol 18 (2) ◽  
pp. 0233 ◽  
Author(s):  
Hassan S Bakouch ◽  
Sanku Dey ◽  
Pedro Luiz Ramos ◽  
Francisco Louzada

In this paper, we have considered different estimation methods of the unknown parameters of a binomial-exponential 2 distribution. First, we briefly describe different frequentist approaches such as the method of moments, modified moments, ordinary least-squares estimation, weightedleast-squares estimation, percentile, maximum product of spacings, Cramer-von Mises type minimum distance, Anderson-Darling and Right-tail Anderson-Darling, and compare them using extensive numerical simulations. We apply our proposed methodology to three real data sets related to the total monthly rainfall during April, May and September at Sao Carlos, Brazil.


Export Citation Format

Share Document