Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Structured regularization for conditional Gaussian graphical models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Conditional Gaussian graphical models are a reparametrization of the multivariate linear regression model which explicitly exhibits (i) the partial covariances between the predictors and the responses, and (ii) the partial covariances between the responses themselves. Such models are particularly suitable for interpretability since partial covariances describe direct relationships between variables. In this framework, we propose a regularization scheme to enhance the learning strategy of the model by driving the selection of the relevant input features by prior structural information. It comes with an efficient alternating optimization procedure which is guaranteed to converge to the global minimum. On top of showing competitive performance on artificial and real datasets, our method demonstrates capabilities for fine interpretation, as illustrated on three high-dimensional datasets from spectroscopy, genetics, and genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://r-forge.r-project.org/projects/spring-pkg/.

  2. We set \(\mathbf {R}\) a correlation matrix in order not to excessively penalize the LASSO or the group-LASSO, which both use the same tuning parameter \(\lambda _1\) across the outcomes (and thus the same variance estimator).

  3. We also used the same seed and CV-folds for both scenarios.

  4. This value directly arises from the definition of the genetic distance itself.

References

  • Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012)

    Article  MATH  Google Scholar 

  • Brown, P., Vannucci, M., Fearn, T.: Multivariate bayesian variable selection and prediction. J. R. Stat. Soc. B 60(3), 627–641 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Brown, P., Fearn, T., Vannucci, M.: Bayesian wavelet regression on curves with applications to a spectroscopic calibration problem. J. Am. Stat. Assoc. 96, 398–408 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Chiquet, J., Grandvalet, Y., Ambroise, C.: Inferring multiple graphical structures. Stat. Comput. 21(4), 537–553 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • de los Campos, G., Hickey, J., Pong-Wong, R., Daetwyler, H., Calus, M.: Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193(2), 327–345 (2012)

    Article  Google Scholar 

  • Efron, B.: The estimation of prediction error: covariance penalties and cross-validation (with discussion). J. Am. Stat. Assoc. 99, 619–642 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Ferreira, M., Satagopan, J., Yandell, B., Williams, P., Osborn, T.: Mapping loci controlling vernalization requirement and flowering time in brassica napus. Theor. Appl. Genet. 90, 727–732 (1995)

    Article  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  • Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11(12), 4241–4257 (2000)

    Article  Google Scholar 

  • Hans, C.: Elastic net regression modeling with the orthant normal prior. J. Am. Stat. Assoc. 106, 1383–1393 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Harville, D.: Matrix Algebra from a Statistician’s Perspective. Springer, New York (1997)

    Book  MATH  Google Scholar 

  • Hebiri, M., van De Geer, S.: The smooth-lasso and other l1 + l2 penalized methods. Electron. J. Stat. 5, 1184–1226 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Hesterberg, T., Choi, N.M., Meier, L., Fraley, C.: Least angle and \(\ell _{1}\) penalized regression: a review. Stat. Surv. 2, 61–93 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Hoefling, H.: A path algorithm for the fused lasso signal approximator. J. Comput. Graph. Stat. 19(4), 984–1006 (2010)

    Article  MathSciNet  Google Scholar 

  • Kim, S., Xing, E.: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5(8), e1000587 (2009)

    Article  Google Scholar 

  • Kim, S., Xing, E.: Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th International Conference on Machine Learning, pp. 543–550 (2010)

  • Kim, S.J., Koh, K., Boyd, S., D, G.: \(\ell _1\) trend filtering. SIAM Rev. 51(2), 339–360 (2009)

    Article  MathSciNet  Google Scholar 

  • Kole, C., Thorman, C., Karlsson, B., Palta, J., Gaffney, P., Yandell, B., Osborn, T.: Comparative mapping of loci controlling winter survival and related traits in oilseed brassica rapa and B. napus. Mol. Breed. 1, 201–210 (2002)

    Article  Google Scholar 

  • Krishna, A., Bondell, H., Ghosh, S.: Bayesian variable selection using an adaptive powered correlation prior. J. Stat. Plan. Inference 139(8), 2665–2674 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Lajoie, M., Gascuel, O., Lefort, V., Brehelin, L.: Computational discovery of regulatory elements in a continuous expression space. Genome Biol. 13(11), R109 (2012). doi:10.1186/gb-2012-13-11-r109

    Article  Google Scholar 

  • Li, C., Li, H.: Variable selection and regression analysis for graph-structured covariates with an application to genomics. Ann. Appl. Stat. 4(3), 1498–1516 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, X., Panea, C., Wiggins, C., Reinke, V., Leslie, C.: Learning “graph-mer” motifs that predict gene expression trajectories in development. PLoS Comput. Biol. 6(4), e1000,761 (2010)

    Article  Google Scholar 

  • Lorbert, A., Eis, D., Kostina, V., Blei, D., Ramadge, P.: Exploiting covariate similarity in sparse regression via the pairwise elastic net. In: Teh, Y.W., Titterington, D.M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS-10), vol. 9, pp. 477–484 (2010)

  • Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, London (1979)

  • Marin, J.M., Robert, C.P.: Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York (2007)

    MATH  Google Scholar 

  • Obozinski, G., Wainwright, M., Jordan, M.: Support union recovery in high-dimensional multivariate regression. Ann. Stat. 39(1), 1–47 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Osborne, B., Fearn, T., Miller, A., Douglas, S.: Application of near infrared reflectance spectroscopy to compositional analysis of biscuits and biscuit doughs. J. Sci. Food Agric. 35, 99–105 (1984)

    Article  Google Scholar 

  • Osborne, M.R., Presnell, B., Turlach, B.A.: On the lasso and its dual. J. Comput. Graph. Stat. 9(2), 319–337 (2000)

    MathSciNet  Google Scholar 

  • Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103, 681–686 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E., Vert, J.P.: Classification of microarray data using gene networks. BMC Bioinform. 8, 35 (2007)

    Article  Google Scholar 

  • Rothman, A., Levina, E., Zhu, J.: Sparse multivariate regression with covariance estimation. J. Comput. Graph. Stat. 19(4), 947–962 (2010)

    Article  MathSciNet  Google Scholar 

  • Shannon, P.: MotifDb: An Annotated Collection of Protein-DNA Binding Sequence Motifs. R package version 1.4.0 (2013)

  • Slawski, M., W, Zu Castell, Tutz, G.: Feature selection guided by structural information. Ann. Appl. Stat. 4, 1056–1080 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Sohn, K., Kim, S.: Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. JMLR W&CP(22), 1081–1089 (2012)

    Google Scholar 

  • Städler, N., Bühlmann, P., Geer, S.: \(\ell _1\)-penalization for mixture regression models. Test 19(2), 209–256 (2010). doi:10.1007/s11749-010-0197-z

    Article  MathSciNet  MATH  Google Scholar 

  • Stein, C.: Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9, 1135–1151 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011). doi:10.1214/11-AOS878

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R., Taylor, J.: Degrees of freedom in lasso problems. Ann. Stat. 40, 1198–1232 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B 67, 91–108 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Yin, J., Li, H.: A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Stat. 5, 2630–2650 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan, X.T., Zhang, T.: Partial Gaussian graphical model estimation. IEEE Trans. Inform. Theory 60(3), 1673–1687 (2014)

    Article  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank Mathieu Lajoie and Laurent Bréhélin for kindly sharing the dataset from Gasch et al. (2000). We also thank the reviewers for their questions and remarks, which helped us to improve our manuscript. This project was conducted in the framework of the project AMAIZING funded by the French ANR. This work has been partially supported by the GRANT Reg4Sel from the French INRA-SelGen metaprogram.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Chiquet.

Appendix: Proofs

Appendix: Proofs

Most of the proofs rely on basic algebra and properties of the trace and the \(\mathrm {vec}\) operators (see e.g. Harville 1997), in particular

$$\begin{aligned}&\mathrm {tr}(\mathbf {A}^T \mathbf {B}\mathbf {C}\mathbf {D}^T) = \mathrm {vec}(\mathbf {A})^T (\mathbf {D}\otimes \mathbf {B}) \mathrm {vec}(\mathbf {C}),\\&\mathrm {vec}(\mathbf {A}\mathbf {B}\mathbf {C}) = (\mathbf {C}^T \otimes \mathbf {A}) \mathrm {vec}(\mathbf {B}), \\&(\mathbf {A}\otimes \mathbf {B}) (\mathbf {C}\otimes \mathbf {D}) = (\mathbf {A}\mathbf {C}) \otimes (\mathbf {C}\mathbf {D}). \end{aligned}$$

1.1 Derivation of Proposition 1

Concerning the two regularization terms in (7), we have \(\Vert {\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}\Vert _1 = \Vert {\varvec{\omega }}\Vert _1\) since the \(\ell _1\)-norm applies element-wise here and

$$\begin{aligned} \mathrm {tr}({\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}} \mathbf {L}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1}) = {\varvec{\omega }}^T ({\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1} \otimes \mathbf {L}) {\varvec{\omega }}. \end{aligned}$$

As for the log-likelihood (5), we work on the trace term:

$$\begin{aligned}&\mathrm {tr}\left( \left( \mathbf {Y}+ \mathbf {X}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1} \right) ^T \left( \mathbf {Y}+ \mathbf {X}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1} \right) {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}\right) \\&\quad = \mathrm {vec}\left( \mathbf {Y}+ \mathbf {X}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1}\right) ^T \left( {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}\otimes \mathbf {I}_n\right) \mathrm {vec}\left( \mathbf {Y}+ \mathbf {X}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1}\right) \\&\quad = \left\| \left( {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}\otimes \mathbf {I}_n\right) ^{1/2} \left( \mathrm {vec}(\mathbf {Y}) + \left( {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1} \otimes \mathbf {X}\right) \mathrm {vec}({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}) \right) \right\| _2^2 \\&\quad = \left\| \mathrm {vec}(\mathbf {Y}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{1/2}) + \left( {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1/2}\otimes \mathbf {X}\right) {\varvec{\omega }}\right\| _2^2. \end{aligned}$$

The rest of the proof is straightforward.

1.2 Convexity lemma

Lemma 1

The function

$$\begin{aligned} -\frac{1}{n}\log L({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}},{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}) + \frac{\lambda _2}{2} \mathrm {tr}\left( {\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}}\mathbf {L}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1}\right) \end{aligned}$$

is jointly convex in \(({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}},{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}})\) and admits at least one global minimum which is unique when \(n\ge q\) and \((\lambda _2\mathbf {L}+ \mathbf {S}_{\mathbf {x}\mathbf {x}})\) is positive definite.

The convexity of \(-\frac{1}{n}\log L({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}},{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}})\) is proved in Yuan and Zhang (2014) (Proposition 1). Similar arguments can be straightforwardly applied in the case at hand. Existence of the global minimum is related to strict convexity in both \({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}\) and \({\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}\), where direct differentiation leads to the corresponding conditions.

1.3 Proof of Proposition 2

The convexity of criterion \(J({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}},{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}})\) in \(({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}},{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}})\) is straightforward thanks to Lemma 1 and considering the fact that \(\Vert {\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}\Vert _1\) is also convex. One can then apply the results developed in Tseng and Yun (2009), Tseng (2001) on the convergence of block coordinate descent for the minimization of nonsmooth separable function. Since (7) is clearly separable in \(({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}, {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}})\) for the nonsmooth part induced by the \(\ell _1\)-norm, the alternating scheme is guaranteed to converge to the unique global minimum under the assumption of Lemma 1. It remains to show that the two convex optimization subproblems (9a) and (9b) can be (efficiently) solved in practice.

Firstly, (9b) can be recast as an Elastic-Net problem, which in turn can be recast as a LASSO problem (see, e.g. Zou and Hastie 2005; Slawski et al. 2010). This is straightforward thanks to Proposition 1: when \(\hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}}\) is fixed, solution to (9b) can be obtained via

(13)

where \(\mathbf {A},\mathbf {b}\) and \(\tilde{\mathbf {L}}\) are defined by

$$\begin{aligned} \mathbf {A}= & {} \left( \hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}}^{-1/2} \otimes \mathbf {X}/\sqrt{n} \right) ,\\ \mathbf {b}= & {} - \mathrm {vec}\left( \mathbf {Y}\hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}}^{1/2}\right) / \sqrt{n} \text { and } \tilde{\mathbf {L}} = \hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}}^{-1} \otimes \mathbf {L}. \end{aligned}$$

Secondly, we can solve analytically (9a) with simple matrix algebra. By differentiation of the objective (7) over \({\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{-1}\) we obtain the quadratic form

$$\begin{aligned} {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}} \mathbf {S}_{\mathbf {y}\mathbf {y}} {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}} - {\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}} = {\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}}(\lambda _2 \mathbf {L}+ \mathbf {S}_{\mathbf {x}\mathbf {x}}){\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}. \end{aligned}$$

After right multiplying both sides by \(\mathbf {S}_{\mathbf {y}\mathbf {y}}\), it becomes obvious that \({\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}} \mathbf {S}_{\mathbf {y}\mathbf {y}}\) and \({\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}}(\lambda _2 \mathbf {L}+ \mathbf {S}_{\mathbf {x}\mathbf {x}}){\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}\mathbf {S}_{\mathbf {y}\mathbf {y}}\) commute and thus share the same eigenvectors \(\mathbf {U}\). Besides, it induces the relationship \(\eta _j^2 -\eta _j = \zeta _j\) between their respective eigenvalues \(\eta _j\) and \(\zeta _j\), and we are looking for the positive solution of \(\eta _j\). To do so, first note that we may assume that \({\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}} (\lambda _2 \mathbf {L}+ \mathbf {S}_{\mathbf {x}\mathbf {x}}) {\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}}\) and \(\mathbf {S}_{\mathbf {y}\mathbf {y}}\) are positive definite, when \({\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}} \ne \mathbf {0}\) and \(n\ge q\); and second, recall that if a matrix is the product of two positive definite matrices then its eigenvalues are positive. Hence, \(\zeta _j>0\) and the positive solution of \(\eta _j\) is \(\eta _j = (1 + \sqrt{1+4\zeta _j})/2\). We thus obtain

$$\begin{aligned} \hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}} = \mathbf {U}\mathrm {diag}(\varvec{\eta }) \mathbf {U}^{-1} \mathbf {S}_{\mathbf {y}\mathbf {y}}^{-1}. \end{aligned}$$
(14)

Direct inversion yields

$$\begin{aligned} \hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}} = \mathbf {U}\mathrm {diag}(\varvec{\eta }/\varvec{\zeta }) \mathbf {U}^{-1} \hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {x}} (\lambda _2\mathbf {L}+ \mathbf {S}_{\mathbf {x}\mathbf {x}}) \hat{{\varvec{\varOmega }}}_{\mathbf {x}\mathbf {y}} (=\hat{\mathbf {R}}^{-1}), \end{aligned}$$

To get an expression for \(\hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}}\) which does not require additional matrix inversion, just note that

$$\begin{aligned} \mathbf {S}_{\mathbf {y}\mathbf {y}}^{-1}= & {} \left( {\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}}\hat{{\varvec{\varSigma }}}_{\mathbf {x}\mathbf {x}}^{\lambda _2}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}} \mathbf {S}_{\mathbf {y}\mathbf {y}}\right) ^{-1} {\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}}\hat{{\varvec{\varSigma }}}_{\mathbf {x}\mathbf {x}}^{\lambda _2}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}} \\= & {} \mathbf {U}\mathrm {diag}(\varvec{\zeta }^{-1})\mathbf {U}^{-1}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {x}} \hat{{\varvec{\varSigma }}}_{\mathbf {x}\mathbf {x}}^{\lambda _2}{\varvec{\varOmega }}_{\mathbf {x}\mathbf {y}} \end{aligned}$$

where \(\hat{{\varvec{\varSigma }}}_{\mathbf {x}\mathbf {x}}^{\lambda _2} = (\lambda _2\mathbf {L}+ \mathbf {S}_{\mathbf {x}\mathbf {x}})\). Combined with (14), this last equality leads to

$$\begin{aligned} \hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}}^{-1} = \mathbf {S}_{\mathbf {y}\mathbf {y}}\mathbf {U}\mathrm {diag}(\varvec{\eta }^{-1}) \mathbf {U}^{-1} (=\hat{\mathbf {R}}). \end{aligned}$$

Finally, in the particular case where \(\hat{{\varvec{\varOmega }}}_{\mathbf {x}\mathbf {y}} = \mathbf {0}\), \(\hat{{\varvec{\varOmega }}}_{\mathbf {y}\mathbf {y}} = \mathbf {S}_{\mathbf {y}\mathbf {y}}^{-1}\).

1.4 Derivation of Proposition 3

To apply the results developed in Tibshirani and Taylor (2012) that rely on the well-known Stein’s Lemma (Stein 1981), we basically need to recast our problem as a classical LASSO applied on a Gaussian linear regression model such that the response vector follows a normal distribution of the form \(\mathscr {N}({\varvec{\mu }},\sigma \mathbf {I})\). This is straightforward by means of Proposition 1: in the same way as we derive expression (13), we can go a little further and reach the following LASSO formulation

(15)

with \(\mathbf {A},\mathbf {b}\) and \(\tilde{\mathbf {L}}\) defined as in (13). From model (4), it is not difficult to see that \(\mathbf {b}\) corresponds to an uncorrelated vector form of \(\mathbf {Y}\) so as

$$\begin{aligned} \begin{pmatrix} \mathbf {b} \\ \mathbf {0}\end{pmatrix} = \begin{pmatrix} -\mathrm {vec}(\mathbf {Y}{\varvec{\varOmega }}_{\mathbf {y}\mathbf {y}}^{1/2}) \\ \mathbf {0}\end{pmatrix} \sim \mathscr {N}({\varvec{\mu }}, \mathbf {I}_{nq}). \end{aligned}$$

The explicit form of \({\varvec{\mu }}\) is of no interest here. The point is essentially to underline that the response vector is uncorrelated in (15), which allows us to apply Theorem 2 of Tibshirani and Taylor (2012) and results therein, notably for the Elastic-Net. By these means, an unbiased estimator of the degrees of freedom in (15) can be written as a function of the active set \(\mathscr {A}\) in \(\hat{{\varvec{\omega }}}^{\lambda _1,\lambda _2}\):

$$\begin{aligned} \hat{\mathrm {df}}_{\lambda _1,\lambda _2} = \mathrm {tr}\left( \mathbf {A}_{\mathscr {A}} \left( (\mathbf {A}^T\mathbf {A}+ \lambda _2 \tilde{\mathbf {L}})_{\mathscr {A}\mathscr {A}}\right) ^{-1} \mathbf {A}_{\mathscr {A}}^T \right) . \end{aligned}$$

Routine simplifications lead to the desired result for the degrees of freedom.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chiquet, J., Mary-Huard, T. & Robin, S. Structured regularization for conditional Gaussian graphical models. Stat Comput 27, 789–804 (2017). https://doi.org/10.1007/s11222-016-9654-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9654-1

Keywords