Abstract
Testing for mutual independence among several random vectors is a challenging problem, and in recent years, it has gained significant attention in statistics and machine learning literature. Most of the existing tests of independence deal with only two random vectors, and they do not have straightforward generalizations for testing mutual independence among more than two random vectors of arbitrary dimensions. On the other hand, there are various tests for mutual independence among several random variables, but these univariate tests do not have natural multivariate extensions. In this article, we propose two general recipes, one based on inter-point distances and the other based on linear projections, for multivariate extensions of these univariate tests. Under appropriate regularity conditions, these resulting tests turn out to be consistent whenever we have consistency for the corresponding univariate tests. We carry out extensive numerical studies to compare the empirical performance of these proposed methods with the state-of-the-art methods.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bilodeau, M., Nangue, A.G.: Tests of mutual or serial independence of random vectors with applications. J. Mach. Learn. Res. 18, 2518–2557 (2017)
Biswas, M., Sarkar, S., Ghosh, A.K.: On some exact distribution-free tests of independence between two random vectors of arbitrary dimensions. J. Statist. Plan. Inf. 175, 78–86 (2016)
Breitenberger, E.: Analogues of the normal distribution on the circle and the sphere. Biometrika 50, 81–88 (1963)
Chakraborty, S., Zhang, X.: Distance metrics for measuring joint dependence with application to causal inference. J. Am. Stat. Assoc. 114, 1638–1650 (2019)
Fan, Y., Lafaye de Micheaux, P., Penev, S., Salopek, D.: Multivariate nonparametric test of independence. J. Multivar. Anal. 153, 189–210 (2017)
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer, Berlin (2006)
Friedman, J.H., Rafsky, L.C.: Graph-theoretic measures of multivariate association and prediction. Ann. Stat. 11, 377–391 (1983)
Fukumizu, K., Gretton, A., Lanckriet, G.R., Schölkopf, B., Sriperumbudur, B.K.: Kernel choice and classifiability for rkhs embeddings of probability distributions. Adv. Neural Inf. Process. Syst. 1750–1758 (2009)
Gaißer, S., Ruppert, M., Schmid, F.: A multivariate version of Hoeffding’s phi-square. J. Multivar. Anal. 101, 2571–2586 (2010)
Ghosh, A.K., Chaudhuri, P., Murthy, C.: On visualization and aggregation of nearest neighbor classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1592–1602 (2005)
Ghosh, A.K., Chaudhuri, P., Sengupta, D.: Classification using kernel density estimates: multiscale analysis and visualization. Technometrics 48, 120–132 (2006)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.: A kernel statistical test of independence. Adv. Neural Inf. Process. Syst. 585–592 (2007)
Gretton, A., Gyorfi, L.: Consistent nonparametric tests of independence. J. Mach. Learn. Res. 11, 1391–1423 (2010)
Heller, R., Gorfine, M., Heller, Y.: A class of multivariate distribution-free tests of independence based on graphs. J. Stat. Plan. Inf. 142, 3097–3106 (2012)
Heller, R., Heller, Y.: Multivariate tests of association based on univariate tests. Adv. Neural Inf. Process. Syst. 208–216 (2016)
Heller, R., Heller, Y., Gorfine, M.: A consistent multivariate test of association based on ranks of distances. Biometrika 100, 503–510 (2013)
Huang, C., Huo, X.: A statistically and numerically efficient independence test based on random projections and distance covariance. arXiv preprint arXiv:1701.06054 (2017)
Jin, Z., Matteson, D.S.: Generalizing distance covariance to measure and test multivariate mutual dependence via complete and incomplete v-statistics. J. Multivar. Anal. 168, 304–322 (2018)
Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, New Yok (2009)
McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15, 463–481 (1973)
Nelsen, R.B.: Nonparametric measures of multivariate association. Lecture Notes-Monograph Series 223–232 (1996)
Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006)
Newton, M.A.: Introducing the discussion paper by Székely and Rizzo. Ann. Appl. Stat. 3, 1233–1235 (2009)
Pfister, N., Bühlmann, P., Schölkopf, B., Peters, J.: Kernel-based tests for joint independence. J. R. Stat. Soc. Ser. B 80, 5–31 (2018)
Póczos, B., Ghahramani, Z., Schneider, J.: Copula-based kernel dependency measures. In: Proceedings of 29th International Conference on Machine Learning, pp. 1635–1642 (2012)
Quadrianto, N., Song, L., Smola, A.J.: Kernelized sorting. Adv. Neural Inf. Process. Syst. 1289–1296 (2009)
Rawat, R., Sitaram, A.: Injectivity sets for spherical means on \({\mathbb{R}}^n\) and on symmetric spaces. J. Fourier Anal. Appl. 6, 343–348 (2000)
Roy, A., Ghosh, A. K., Goswami, A., Murthy, C. A.: Some new copula based distribution-free tests for independence among several random variables. Sankhya, Series A. To appear (2020). https://doi.org/10.1007/s13171-020-00207-2
Sarkar, S., Ghosh, A.K.: Some multivariate tests of independence based on ranks of nearest neighbors. Technometrics 60, 101–111 (2018)
Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41, 2263–2291 (2013)
Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007)
Úbeda-Flores, M.: Multivariate versions of Blomqvist’s beta and Spearman’s footrule. Ann. Inst. Stat. Math. 57, 781–788 (2005)
Acknowledgements
We are thankful to all anonymous reviewers for their careful reading of earlier versions of the article and for providing us with several helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 1
For any fixed \(\delta >0\), define \(p_{n,\sigma }(\delta ,F)= \sup _{\mathbf{a}} \Pr (\left| {{\mathbb {T}}}_{n,\sigma }(F^{\mathbf{a}})-{\mathbb T}_{\sigma }(F^{\mathbf{a}})\right| >\delta )\). If m(n) is a polynomial function of n, then \(\sum _{n=1}^\infty m(n)p_{n,\sigma }(\delta ,F)<\infty \).
Proof
Recall that \({{\mathbb {T}}}_{\sigma }(F^{\mathbf{a}})\) and \({\mathbb T}_{n,\sigma }(F^{\mathbf{a}})\) are defined as
where \(C^{\mathbf{a}}\), \(M\), \(\varPi \) and \(C^{\mathbf{a}}_n\), \(M_n\), \(\varPi _n\) are as defined in Section 2.2.
Now, observe that
First consider the term
Note that \({\gamma _{\sigma }(C^{\mathbf{a}}_n,\varPi _n)}/\big ({\gamma _{\sigma }(M_n,\varPi _n)\gamma _{\sigma }(M,\varPi )}\big )\) is uniformly bounded and \(\left| \gamma _{\sigma }(M_n,\varPi _n)-\gamma _{\sigma }(M,\varPi )\right| \) is a non-random quantity that converges to 0 as n tends to infinity (see Roy et al. 2020, Lemma L2). Therefore, there exists \(n_0\ge 1\) such that for all \(n >n_0\),
Note that this \(n_0\) does not depend on \(\mathbf{a}\). Again,
where \(\delta ^*=\gamma _{\sigma }(M,\varPi )\frac{\delta }{2}\) is a positive constant. So, it is enough to show the finiteness of
Let \(F^{(\mathbf{a},1)},\ldots ,F^{(\mathbf{a},p)}\) be the distribution functions of \(X^{(\mathbf{a},1)},\ldots ,X^{(\mathbf{a},p)}\), respectively. Also define \(C^{\mathbf{a}*}_n\) to be the empirical joint distribution of \(\left( F^{(\mathbf{a},1)}(X^{(\mathbf{a},1)}_1), \ldots ,\right. \)\(\left. F^{(\mathbf{a},p)}(X^{(\mathbf{a},p)}_1)\right) ,\ldots ,\left( F^{(\mathbf{a},1)}(X^{(\mathbf{a},1)}_n),\ldots ,F^{(\mathbf{a},p)}(X^{(\mathbf{a},p)}_n)\right) .\) Then, following Theorem 6 of Roy et al. (2020), we get
Now, from Lemma L2 in Roy et al. (2020), we can show that there exists \(n_1\ge 1\) (\(n_1\) does not depend on \(\mathbf{a}\)) such that for all \(n>n_1\), \(\text {Pr}\left( \left| \gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma }^2(C^{\mathbf{a}}_n,\varPi )\right| ^{\frac{1}{2}}> \delta ^{*}/3\right) =0\). Again, from Lemmas L3 and L4 in Roy et al. (2020), we have \(\text {Pr}\Bigl (\gamma _{\sigma }(C^{\mathbf{a}}_n,C^{\mathbf{a}*}_n)>\delta ^{*}/3\Bigr )<2p\exp \left( -\frac{n{\delta ^*}^2}{18pL^2}\right) \) and \(\text {Pr}\left( \gamma _{\sigma }(C^{\mathbf{a}*}_n,C^{\mathbf{a}})>\delta ^{*}/{3}\right) <\exp \left( -\frac{n}{2}\left( \frac{\delta ^{*}}{3}-\frac{2}{\sqrt{n}}\right) ^2\right) \), respectively, where L is a positive constant independent of \(\mathbf{a}\). So, to prove the lemma, it is sufficient to show that
Clearly, this is true since m(n) is a polynomial function of n. \(\square \)
Lemma 2
Consider a sequence \(\{\sigma (n): n\ge 1\}\), which converges to \(\sigma _0>0\). For any fixed \(\delta >0\), define \({\tilde{p}}_n(\delta ,F)=\sup _{\mathbf{a}} \Pr (\left| {\mathbb T}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{\sigma _0}(F^{\mathbf{a}})\right| >\delta )\). If m(n) is a polynomial function of n, then \(\sum _{n=1}^\infty m(n){\tilde{p}}_{n}(\delta ,F)<\infty \).
Proof
Note that for any fixed \(\mathbf{a}\),
So, in view of Lemma 1, it is enough to show that \(\sum _{n=1}^{\infty } m(n) \sup _{\mathbf{a}} \Pr (\left| {\mathbb T}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{n,\sigma _0}(F^{\mathbf{a}})\right| >\delta /2)\) is finite. Now, note that
Note that while the term \(\gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n)\) is uniformly bounded, \(\left| \frac{1}{\gamma _{\sigma (n)}(M_n,\varPi _n)}-\frac{1}{\gamma _{\sigma _0}(M_n,\varPi _n)}\right| \) is a non-random quantity converging to 0 (follows from Lemma L6 in Roy et al. (2020)). Therefore, there exists a natural number \(n_0\) (independent of \(\mathbf{a}\)) such that for all \(n> n_0\), we have \(A_n \le \delta /4\) with probability one.
Now, \(\gamma _{\sigma _0}(M_n,\varPi _n)\) is a non-random quantity converging to \(\gamma _{\sigma _0}(M,\varPi )\). Also, from Lemma L6 in Roy et al. (2020), we get a non-random upper bound for \(|\gamma _{\sigma (n)}(C^{\mathbf{a}}_n,\varPi _n)-\gamma _{\sigma _0}(C^{\mathbf{a}}_n,\varPi _n)|\), which converges to 0. Since this upper bound does not depend on \(\mathbf{a}\), we get a natural number \(n_1\) (independent of \(\mathbf{a}\)) such that for all \(n> n_1\), \(B_n \le \delta /4\) with probability one.
Using these two facts, for all \(n>\max \{n_0,n_1\}\), we have \(A_n+B_n < \delta /2\) with probability one, and hence,
This implies the finiteness of \(\sum _{n=1}^{\infty } m(n) \sup _{\mathbf{a}} \Pr (|{\mathbb T}_{n,\sigma (n)}(F^{\mathbf{a}})-{\mathbb T}_{n,\sigma _0}(F^{\mathbf{a}})|>\delta /2)\). \(\square \)
Rights and permissions
About this article
Cite this article
Roy, A., Sarkar, S., Ghosh, A.K. et al. On some consistent tests of mutual independence among several random vectors of arbitrary dimensions. Stat Comput 30, 1707–1723 (2020). https://doi.org/10.1007/s11222-020-09967-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-020-09967-1