Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

distance correlation
Recently Published Documents


TOTAL DOCUMENTS

209
(FIVE YEARS 74)

H-INDEX

21
(FIVE YEARS 3)

Author(s):  
Cheng Huang ◽  
Xiaoming Huo

Testing for independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation-based hypotheses testing for independence) have many advantages, compared with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when the sample size is n, the order of computational complexity of a distance-based method, which typically requires computing of all pairwise distances, can be O(n2). Recent advances have discovered that in the univariate cases, a fast method with O(n log  n) computational complexity and O(n) memory requirement exists. In this paper, we introduce a test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the multivariate cases, and enjoys the O(nK log  n) computational complexity and O( max{n, K}) memory requirement, where K is the number of random projections. Note that saving is achieved when K < n/ log  n. We name our method a Randomly Projected Distance Covariance (RPDC). The statistical theoretical analysis takes advantage of some techniques on the random projection which are rooted in contemporary machine learning. Numerical experiments demonstrate the efficiency of the proposed method, relative to numerous competitors.


Technometrics ◽  
2021 ◽  
pp. 1-21
Author(s):  
Andi Wang ◽  
Juan Du ◽  
Xi Zhang ◽  
Jianjun Shi
Keyword(s):  

2021 ◽  
Author(s):  
Javier Pardo-Diaz ◽  
Philip Poole ◽  
Mariano Beguerisse-Diaz ◽  
Charlotte Deane ◽  
Gesine Reinert

Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here introduce a principled method to construct \emph{weighted} gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyse data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to networks obtained from Pearson correlation. Moreover, we show that signed distance correlation networks capture more biological information than unweighted networks based on the same metric. While we use biological data sets to illustrate the method, the approach is general and can be used to construct networks in other domains.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009548
Author(s):  
Qunlun Shen ◽  
Shihua Zhang

With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.


2021 ◽  
Vol 9 ◽  
Author(s):  
Anil Kamat ◽  
Amrita Sah

Border closure or travel restriction is a critical issue as closing the border early can badly affect the economy of the country, whereas substantial delay can put human lives at stake. While many papers discuss closing the border early in the pandemic, the question of when to close the border has not been addressed well. We have tried to estimate a date of closing the border by taking the reference of a neighboring country with a high correlation in Covid-19 incidence. Here we have used non-linear methods to probe the landscape of correlation between temporal COVID-19 incidences and deaths. We have tested our method on two neighboring countries, Nepal and India, with open borders, where closing the borders are among the top priorities to reduce the spread and spill-out of variants. We have selected these countries as they have close connectivity and intertwined socio-economic network with thousands of people crossing the border every day. We found the distance correlation for COVID-19 incidence between these countries to be statistically significant (p < 0.001) and there is a lag of 6 days for maximum correlation. In addition, we analyzed the correlation for each wave and found the distance correlation for the first phase is 0.8145 (p < 0.001) with a lag of 2 days, and the distance correlation for the second wave is 0.9685 (p < 0.001) without any lag. This study can be a critical planning tool for policymakers and public health practitioners to make an informed decision on border closure in the early days as it is critically associated with the legal and diplomatic agreements and regulations between two countries.


2021 ◽  
pp. SP512-2021-79
Author(s):  
Xiang-dong Wang ◽  
Sun-rong Yang ◽  
Le Yao ◽  
Tetsuo Sugiyama ◽  
Ke-yi Hu

AbstractRugose corals are one of the major fossil groups in shallow-water environments. They played an important role in dividing and correlating Carboniferous strata during the last century, when regional biostratigraphic schemes were established and may be useful for long-distance correlation. Carboniferous rugose corals document two evolutionary events. One is the Tournaisian recovery event, with abundant occurrences of typical Carboniferous rugose corals such as columellate taxa and a significant diversification of large, dissepimented corals. The other is the changeover of rugose coral composition at the mid-Carboniferous boundary, which is represented by the disappearance of many large dissepimented taxa with complex axial structures and the appearance of typical Pennsylvanian taxa characterized by compound rugose taxa. The biostratigraphic scales for rugose corals show a finer temporal resolution in the Mississippian than in the Pennsylvanian, which was probably caused by the Late Paleozoic Ice Age that resulted in glacial-eustatic changes and a lack of continuous Pennsylvanian carbonate strata. The Pennsylvanian rugose corals are totally missing in the Cimmerian Continent. High-resolution biostratigraphy of rugose corals has so far only achieved in few regions for the Mississippian time scale. In most regions, more detailed taxonomic works and precise correlations between different fossil groups are needed.


2021 ◽  
Vol 50 (9) ◽  
pp. 2755-2764
Author(s):  
Yusrina Andu ◽  
Muhammad Hisyam Lee ◽  
Zakariya Yahya Algamal

Stock market is found in many financial studies. Nonetheless, many of these literatures do not consider on the highly correlated stock market price. In particular, the studies on variable selection, grouping effects and robust dedicated to high dimension stock market price can be considered as scarce. Penalized linear regression using elastic net is one of the recognized methods to perform variable selection. However, the lack of consistency in variable selection may reduce the model performance. Hence, adaptive elastic net with distance correlation (AEDC) is proposed in this study and compared against elastic net, adaptive elastic net with elastic weight and adaptive elastic net with ridge weight. AEDC had lower mean squared error when the alpha increases from 0.05 to 0.95. Thus, the proposed method has successfully contributed to encouraging grouping effects between the highly correlated variables and also has an improved model performance in the presence of robustness.


Export Citation Format

Share Document