Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

scholarly journals Pointwise Mutual Information based Graph Laplacian Regularized Sparse Unmixing

Author(s):  
Sefa Kucuk ◽  
Seniha Esen Yuksel

Sparse unmixing (SU) aims to express the observed image signatures as a linear combination of pure spectra known a priori and has become a very popular technique with promising results in analyzing hyperspectral images (HSI) over the past ten years. In SU, utilizing the spatial-contextual information allows for more realistic abundance estimation. To make full use of the spatial-spectral information, in this letter, we propose a pointwise mutual information (PMI) based graph Laplacian regularization for SU. Specifically, we construct the affinity matrices via PMI by modeling the association between neighboring image features through a statistical framework, and then we use them in the graph Laplacian regularizer. We also adopt a double reweighted $\ell_{1}$ norm minimization scheme to promote the sparsity of fractional abundances. Experimental results on simulated and real data sets prove the effectiveness of the proposed method and its superiority over competing algorithms in the literature.

2021 ◽  
Author(s):  
Sefa Kucuk ◽  
Seniha Esen Yuksel

Sparse unmixing (SU) aims to express the observed image signatures as a linear combination of pure spectra known a priori and has become a very popular technique with promising results in analyzing hyperspectral images (HSI) over the past ten years. In SU, utilizing the spatial-contextual information allows for more realistic abundance estimation. To make full use of the spatial-spectral information, in this letter, we propose a pointwise mutual information (PMI) based graph Laplacian regularization for SU. Specifically, we construct the affinity matrices via PMI by modeling the association between neighboring image features through a statistical framework, and then we use them in the graph Laplacian regularizer. We also adopt a double reweighted $\ell_{1}$ norm minimization scheme to promote the sparsity of fractional abundances. Experimental results on simulated and real data sets prove the effectiveness of the proposed method and its superiority over competing algorithms in the literature.


2015 ◽  
Vol 2015 ◽  
pp. 1-13
Author(s):  
Jianwei Ding ◽  
Yingbo Liu ◽  
Li Zhang ◽  
Jianmin Wang

Condition monitoring systems are widely used to monitor the working condition of equipment, generating a vast amount and variety of telemetry data in the process. The main task of surveillance focuses on analyzing these routinely collected telemetry data to help analyze the working condition in the equipment. However, with the rapid increase in the volume of telemetry data, it is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a priori knowledge. In this paper, we proposed a probabilistic generative model called working condition model (WCM), which is capable of simulating the process of event sequence data generated and depicting the working condition of equipment at runtime. With the help of WCM, we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect the working mode of an event sequence (working condition diagnosis). Furthermore, we have applied WCM to illustrative applications like automated detection of an anomalous event sequence for the runtime of equipment. Our experimental results on the real data sets demonstrate the effectiveness of the model.


2005 ◽  
Vol 03 (05) ◽  
pp. 1021-1038
Author(s):  
AO YUAN ◽  
GUANJIE CHEN ◽  
CHARLES ROTIMI ◽  
GEORGE E. BONNEY

The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xiangfei Chen ◽  
David Trafimow ◽  
Tonghui Wang ◽  
Tingting Tong ◽  
Cong Wang

PurposeThe authors derive the necessary mathematics, provide computer simulations, provide links to free and user-friendly computer programs, and analyze real data sets.Design/methodology/approachCohen's d, which indexes the difference in means in standard deviation units, is the most popular effect size measure in the social sciences and economics. Not surprisingly, researchers have developed statistical procedures for estimating sample sizes needed to have a desirable probability of rejecting the null hypothesis given assumed values for Cohen's d, or for estimating sample sizes needed to have a desirable probability of obtaining a confidence interval of a specified width. However, for researchers interested in using the sample Cohen's d to estimate the population value, these are insufficient. Therefore, it would be useful to have a procedure for obtaining sample sizes needed to be confident that the sample. Cohen's d to be obtained is close to the population parameter the researcher wishes to estimate, an expansion of the a priori procedure (APP). The authors derive the necessary mathematics, provide computer simulations and links to free and user-friendly computer programs, and analyze real data sets for illustration of our main results.FindingsIn this paper, the authors answered the following two questions: The precision question: How close do I want my sample Cohen's d to be to the population value? The confidence question: What probability do I want to have of being within the specified distance?Originality/valueTo the best of the authors’ knowledge, this is the first paper for estimating Cohen's effect size, using the APP method. It is convenient for researchers and practitioners to use the online computing packages.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Chaowang Lan ◽  
Hui Peng ◽  
Gyorgy Hutvagner ◽  
Jinyan Li

Abstract Background A long noncoding RNA (lncRNA) can act as a competing endogenous RNA (ceRNA) to compete with an mRNA for binding to the same miRNA. Such an interplay between the lncRNA, miRNA, and mRNA is called a ceRNA crosstalk. As an miRNA may have multiple lncRNA targets and multiple mRNA targets, connecting all the ceRNA crosstalks mediated by the same miRNA forms a ceRNA network. Methods have been developed to construct ceRNA networks in the literature. However, these methods have limits because they have not explored the expression characteristics of total RNAs. Results We proposed a novel method for constructing ceRNA networks and applied it to a paired RNA-seq data set. The first step of the method takes a competition regulation mechanism to derive candidate ceRNA crosstalks. Second, the method combines a competition rule and pointwise mutual information to compute a competition score for each candidate ceRNA crosstalk. Then, ceRNA crosstalks which have significant competition scores are selected to construct the ceRNA network. The key idea, pointwise mutual information, is ideally suitable for measuring the complex point-to-point relationships embedded in the ceRNA networks. Conclusion Computational experiments and results demonstrate that the ceRNA networks can capture important regulatory mechanism of breast cancer, and have also revealed new insights into the treatment of breast cancer. The proposed method can be directly applied to other RNA-seq data sets for deeper disease understanding.


2020 ◽  
Vol 15 ◽  
pp. 42-51
Author(s):  
Shou-Jen Chang-Chien ◽  
Wajid Ali ◽  
Miin-Shen Yang

Clustering is a method for analyzing grouped data. Circular data were well used in various applications, such as wind directions, departure directions of migrating birds or animals, etc. The expectation & maximization (EM) algorithm on mixtures of von Mises distributions is popularly used for clustering circular data. In general, the EM algorithm is sensitive to initials and not robust to outliers in which it is also necessary to give a number of clusters a priori. In this paper, we consider a learning-based schema for EM, and then propose a learning-based EM algorithm on mixtures of von Mises distributions for clustering grouped circular data. The proposed clustering method is without any initial and robust to outliers with automatically finding the number of clusters. Some numerical and real data sets are used to compare the proposed algorithm with existing methods. Experimental results and comparisons actually demonstrate these good aspects of effectiveness and superiority of the proposed learning-based EM algorithm.


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 62
Author(s):  
Zhengwei Liu ◽  
Fukang Zhu

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.


Econometrics ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 10
Author(s):  
Šárka Hudecová ◽  
Marie Hušková ◽  
Simos G. Meintanis

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.


Export Citation Format

Share Document