4869 Robust Sparse Principal Component Regression Under The High Dimensional Elliptical Model
4869 Robust Sparse Principal Component Regression Under The High Dimensional Elliptical Model
Abstract
In this paper we focus on the principal component regression and its application to
high dimension non-Gaussian data. The major contributions are two folds. First,
in low dimensions and under the Gaussian model, by borrowing the strength from
recent development in minimax optimal principal component estimation, we first
time sharply characterize the potential advantage of classical principal component
regression over least square estimation. Secondly, we propose and analyze a new
robust sparse principal component regression on high dimensional elliptically dis-
tributed data. The elliptical distribution is a semiparametric generalization of the
Gaussian, including many well known distributions such as multivariate Gaus-
sian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector
to be heavy tailed and have tail dependence. These extra flexibilities make it very
suitable for modeling finance and biomedical imaging data. Under the elliptical
model, we prove that our method can estimate the regression coefficients in the
optimal parametric rate and therefore is a good alternative to the Gaussian based
methods. Experiments on synthetic and real world data are conducted to illustrate
the empirical usefulness of the proposed method.
1 Introduction
Principal component regression (PCR) has been widely used in statistics for years (Kendall, 1968).
Take the classical linear regression with random design for example. Let x1 , . . . , xn 2 Rd be n
independent realizations of a random vector X 2 Rd with mean 0 and covariance matrix ⌃. The
classical linear regression model and simple principal component regression model can be elaborated
as follows:
(Classical linear regression model) Y = X + ✏;
(Principal Component Regression Model) Y = ↵Xu1 + ✏, (1.1)
where X = (x1 , . . . , xn )T 2 Rn⇥d , Y 2 Rn , ui is the i-th leading eigenvector of ⌃, and ✏ 2
Nn (0, 2 Id ) is independent of X, 2 Rd and ↵ 2 R. Here Id 2 Rd⇥d is the identity matrix. The
principal component regression then can be conducted in two steps: First we obtain an estimator ub1
of u1 ; Secondly we project the data in the direction of u
b 1 and solve a simple linear regression in
estimating ↵.
By checking Equation (1.1), it is easy to observe that the principal component regression model is a
subset of the general linear regression (LR) model with the constraint that the regression coefficient
is proportional to u1 . There has been a lot of discussions on the advantage of principal component
regression over classical linear regression. In low dimensional settings, Massy (1965) pointed out
that principal component regression can be much more efficient in handling collinearity among pre-
dictors compared to the linear regression. More recently, Cook (2007) and Artemiou and Li (2009)
argued that principal component regression has potential to play a more important role. In partic-
ular, letting u
b j be the j-th leading eigenvector of the sample covariance matrix ⌃ b of x1 , . . . , xn ,
1
Artemiou and Li (2009) show that under mild conditions with high probability the correlation be-
tween the response Y and Xb ui is higher than or equal to the correlation between Y and Xb uj when
i < j. This indicates, although not rigorous, there is possibility that principal component regression
can borrow strength from the low rank structure of ⌃, which motivates our work.
Even though the statistical performance of principal component regression in low dimensions is not
fully understood, there is even less analysis on principal component regression in high dimensions
where the dimension d can be even exponentially larger than the sample size n. This is partially
due to the fact that estimating the leading eigenvectors of ⌃ itself has been difficult enough. For
example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n !
for some > 0, there exist multiple settings under which u b 1 can be an inconsistent estimator of
u1 . To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1 ,
leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007);
Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates
are being established in estimating u1 , . . . , um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013).
Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant
version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the
covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence
they proved is not the parametric rate.
This paper improves upon the aforementioned results in two directions. First, with regard to the
classical principal component regression, under a double asymptotic framework in which d is al-
lowed to increase with n, by borrowing the very recent development in principal component anal-
ysis (Vershynin, 2010; Lounici, 2012; Bunea and Xiao, 2012), we for the first time explicitly show
the advantage of principal component regression over the classical linear regression. We explicitly
confirm the following two advantages of principal component regression: (i) Principal component
regression is insensitive to collinearity, while linear regression is very sensitive to; (ii) Principal
component regression can utilize the low rank structure of the covariance matrix ⌃, while linear
regression cannot.
Secondly, in high dimensions where d can increase much faster, even exponentially faster, than n,
we propose a robust method in conducting (sparse) principal component regression under a non-
Gaussian elliptical model. The elliptical distribution is a semiparametric generalization to the Gaus-
sian, relaxing the light tail and zero tail dependence constraints, but preserving the symmetry prop-
erty. We refer to Klüppelberg et al. (2007) for more details. This distribution family includes many
well known distributions such as multivariate Gaussian, rank deficient Gaussian, t, logistic, and
many others. Under the elliptical model, we exploit the result in Han and Liu (2013a), who showed
that by utilizing a robust covariance matrix estimator, the multivariate Kendall’s tau, we can obtain
an estimator ue 1 , which can recover u1 in the optimal parametric rate as shown in Vu and Lei (2012).
We then exploit u e 1 in conducting principal
p component regression and show that the obtained esti-
mator ˇ can estimate in the optimal s log d/n rate. The optimal rate in estimating u1 and
, combined with the discussion in the classical principal component regression, indicates that the
proposed method has potential to handle high dimensional complex data and has its advantage over
high dimensional linear regression methods, such as ridge regression and lasso. These theoretical
results are also backed up by numerical experiments on both synthetic and real world equity data.
Let Tr(M) be the trace of M. Let j (M) be the j-th largest eigenvalue of M and ⇥j (M) be the
corresponding leading eigenvector. In particular, we let max (M) := 1 (M) and min (M) :=
2
d (M). We define S := {v 2 Rd : kvk2 = 1} to be the d-dimensional unit sphere. We define
d 1
the matrix `max norm and `2 norm as kMkmax := max{|Mij |} and kMk2 := supv2Sd 1 kMvk2 .
We define diag(M) to be a diagonal matrix with [diag(M)]jj = Mjj for j = 1, . . . , d. We denote
c,C
vec(M) := (MT⇤1 , . . . , MT⇤d )T . For any two sequence {an } and {bn }, we denote an ⇣ bn if there
exist two fixed constants c, C such that c an /bn C.
Let x1 , . . . , xn 2 Rd be n independent observations of a d-dimensional random vector X ⇠
Nd (0, ⌃), u1 := ⇥1 (⌃) and ✏1 , . . . , ✏n ⇠ N1 (0, 2 ) are independent from each other and
{Xi }ni=1 . We suppose that the following principal component regression model holds:
Y = ↵Xu1 + ✏, (2.1)
where Y = (Y1 , . . . , Yn )T 2 Rn , X = [x1 , . . . , xn ]T 2 Rn⇥d and ✏ = (✏1 , . . . , ✏n )T 2 Rn . We
are interested in estimating the regression coefficient := ↵u1 .
Let b represent the solution of the classical least square estimator without taking the information
that is proportional to u1 into account. b can be expressed as follows:
b := (XT X) 1
XT Y . (2.2)
We then have the following proposition, which shows that the mean square error of b is highly
related to the scale of min (⌃).
Proposition 2.1. Under the principal component regression model shown in (2.1), we have
2
✓ ◆
b 2 1 1
Ek k2 = + ··· + .
n d 1 1 (⌃) d (⌃)
Proposition 2.1 reflects the vulnerability of least square estimator on the collinearity. More specifi-
cally, when d (⌃) is extremely small, going to zero in the scale of O(1/n), b can be an inconsistent
estimator even when d is fixed. On the other hand, using the Markov inequality, when d (⌃) is
lowerpbounded by a fixed constant and d = o(n), the rate of convergence of b is well known to be
OP ( d/n).
Motivated from Equation (2.1), the classical principal component regression estimator can be elab-
orated as follows.
(1) We first estimate u1 using the leading eigenvector u b := 1 P xi xT .
b 1 of the sample covariance ⌃ n i
(2) We then estimate ↵ 2 R in Equation (2.1) by the standard least square estimation on the projected
data Zb := Xbu1 2 R n :
e := (Z
↵ b T Z)
b 1Z bT Y ,
The final principal component regression estimator e is then obtained as e = ↵ eub 1 . We then have
e
the following important theorem, which provides a rate of convergence for to approximate .
Theorem 2.2. Let r⇤ (⌃) := Tr(⌃)/ max (⌃) represent the effective rank of ⌃ (Vershynin, 2010).
Suppose that r
r⇤ (⌃) log d
k⌃k2 · = o(1).
n
Under the Model (2.1), when max (⌃) > c1 and 2 (⌃)/ 1 (⌃) < C1 < 1 for some fixed constants
C1 and c1 , we have
(r ! r )
1 1 r ⇤ (⌃) log d
ke k2 = OP + ↵+ p · . (2.3)
n max (⌃) n
Theorem 2.2, compared to Proposition 2.1, provides several important messages on the performance
of principal component regression. First, compared to the least square estimator b, e is insensitive
to collinearity in the sense that min (⌃) plays no role in the rate of convergence of e. Secondly,
when min (⌃) is lower bounded by apfixed constant and ↵ is upperp bounded by a fixed constant,
the rate of convergence for b is OP ( d/n) and for e is OP ( r⇤ (⌃) log d/n), while r⇤ (⌃) :=
3
Tr(⌃)/ max (⌃) d and is of order o(d) when there exists a low rank structure for ⌃. These
two observations, combined together, illustrate the advantages of the classical principal component
regression over least square estimation. These advantages justify the use of principal component
regression. There is one more thing to be noted: the performance of e, unlike b, depends on ↵.
When ↵ is small, e can predict more accurately.
These three observations are verified in Figure 1. Here the data are generated according to Equation
(2.1) and we set n = 100, d = 10, ⌃ to be a diagonal matrix with descending diagonal values
⌃ii = i and 2 = 1. In Figure 1(A), we set ↵ = 1, 1 = 10, j = 1 for j = 2, . . . , d 1, and
changing d from 1 to 1/100; In Figure 1(B), we set ↵ = 1, j = 1 for j = 2, . . . , d and changing
1 from 1 to 100; In Figure 1(C), we set 1 = 10, j = 1 for j = 2, . . . , d, and changing ↵ from
0.1 to 10. In the three figures, the empirical mean square error is plotted against 1/ d , 1 , and ↵. It
can be observed that the results, each by each, matches the theory.
1.0
LR
1.0
PCR
0.8
0.8
0.8
Mean Square Error
0.6
0.6
LR LR
PCR PCR
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0 20 40 60 80 100 0 20 40 60 80 100 0 2 4 6 8 10
A B C
Figure 1: Justification of Proposition 2.1 and Theorem 2.2. The empirical mean square errors are
plotted against 1/ d , 1 , and ↵ separately in (A), (B), and (C). Here the results of classical linear
regression and principal component regression are marked in black solid line and red dotted line.
4
We would like to point out that the elliptical family is significantly larger than the Gaussian. In
fact, Gaussian is fully parameterized by finite dimensional parameters (mean and variance). In
contrast, the elliptical is a semiparametric family (since the elliptical density can be represented as
g((x µ)T ⌃ 1(x µ)) where the function g(·) function is completely unspecified.). If we consider
the “volumes” of the family of the elliptical family and the Gaussian family with respect to the
Lebesgue reference measure, the volume of Gaussian family is zero (like a line in a 3-dimensional
space), while the volume of the elliptical family is positive (like a ball in a 3-dimensional space).
3.2 Multivariate Kendall’s tau
As a important step in conducting the principal component regression, we need to estimate u1 =
⇥1 (Cov(X)) = ⇥1 (⌃) as accurately as possible. Since the random variable ⇠ in Equation (3.1)
can be very heavy tailed, the according elliptical distributed random vector can be heavy tailed.
Therefore, as has been pointed out by various authors (Tyler, 1987; Croux et al., 2002; Han and Liu,
2013b), the leading eigenvector of the sample covariance matrix ⌃ b can be a bad estimator in esti-
mating u1 = ⇥1 (⌃) under the elliptical distribution. This motivates developing robust estimator.
In particular, in this paper we consider using the multivariate Kendall’s tau (Choi and Marden, 1998)
and recently deeply studied by Han and Liu (2013a). In the following we give a brief description
of this estimator. Let X ⇠ ECd (µ, ⌃, ⇠) and X f be an independent copy of X. The population
multivariate Kendall’s tau matrix, denoted by K 2 Rd⇥d , is defined as:
!
(X X)(Xf fT
X)
K := E . (3.2)
kX Xk f 2
2
and we have that E(K) b = K. K b is a matrix version U statistic and it is easy to see that
b
maxjk |Kjk | 1, maxjk |Kjk | 1. Therefore, K b is a bounded matrix and hence can be a nicer
statistic than the sample covariance matrix. Moreover, we have the following important proposition,
coming from Oja (2010), showing that K has the same eigenspace as ⌃ and Cov(X).
Proposition 3.1 (Oja (2010)). Let X ⇠ ECd (µ, ⌃, ⇠) be a continuous distribution and K be the
population multivariate Kendall’s tau statistic. Then if j (⌃) 6= k (⌃) for any k 6= j, we have
!
2
j (⌃)Uj
⇥j (⌃) = ⇥j (K) and j (K) = E 2 2 , (3.4)
1 (⌃)U1 + . . . + d (⌃)Ud
5
Then the robust sparse principal component regression can be elaborated as a two step procedure:
(i) Inspired by the model Md (Y , ✏; ⌃, ⇠, s) and Proposition 3.1, we consider the following opti-
mization problem to estimate u1 := ⇥1 (⌃):
u b
e 1 = arg max v T Kv, subject to v 2 Sd 1
\ B0 (s), (3.6)
v2Rd
where B0 (s) := {v 2 Rd : kvk0 s} and K b is the estimated multivariate Kendall’s tau matrix.
The corresponding global optimum is denoted by u
e 1 . Using Proposition 3.1, u
e 1 is also an estimator
of ⇥1 (Cov(X)), whenever the covariance matrix exists.
(ii) We then estimate ↵ 2 R in Equation (3.5) by the standard least square estimation on the projected
data Ze := Xe u1 2 R n :
↵
ˇ := (Z e T Z)
e 1Z eT Y ,
The final principal component regression estimator ˇ is then obtained as ˇ = ↵ ˇue1 .
3.4 Theoretical Property
In Theorem 2.2, we show that how to estimate u1 accurately plays an important role in conducting
the principal component regression. Following this discussion and the very recent results in Han and
Liu (2013a), the following “easiest” and “hardest” conditions are considered. Here L , U are two
constants larger than 1.
1,U 1,U
Condition 1 (“Easiest”): 1 (⌃) ⇣ d j (⌃) for any j 2 {2, . . . , d} and 2 (⌃) ⇣ j (⌃) for any
j 2 {3, . . . , d};
L ,U
Condition 2 (“Hardest”): 1 (⌃) ⇣ j (⌃) for any j 2 {2, . . . , d}.
In the sequel, we say that the model Md (Y , ✏; ⌃, ⇠, s) holds if the data (Y , X) are generated using
the model Md (Y , ✏; ⌃, ⇠, s).
Under Conditions 1 and 2, wepthen have the following theorem, which shows that under certain
conditions, k ˇ k2 = OP ( s log d/n), which is the optimal parametric rate in estimating the
regression coefficient (Ravikumar et al., 2008).
Theorem 3.2. Let the model Md (Y , ✏; ⌃, ⇠, s) hold and |↵| in Equation (3.5) are upper bounded
by a constant and k⌃k2 is lower bounded by a constant. Then under Condition 1 or Condition 2
and for all random vector X such that
max b
|v T (⌃ ⌃)v| = oP (1),
v2Sd 1 ,kvk
0 2s
1.4
1.2
1.2
1.2
1.0
1.0
1.0
1.0
0.8
averaged error
averaged error
averaged error
averaged error
0.8
0.8
0.6
0.6
0.6
0.5
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
number of selected features number of selected features number of selected features number of selected features
Figure 2: Curves of averaged estimation errors between the estimates and true parameters for differ-
ent distributions (normal, multivariate-t, EC1, and EC2, from left to right) using the truncated power
method. Here n = 100, d = 200, and we are interested in estimating the regression coefficient .
The horizontal-axis represents the cardinalities of the estimates’ support sets and the vertical-axis
represents the empirical mean square error. Here from the left to the right, the minimum mean
square errors for lasso are 0.53, 0.55, 1, and 1.
6
4 Experiments
In this section we conduct study on both synthetic and real-world data to investigate the empirical
performance of the robust sparse principal component regression proposed in this paper. We use the
truncated power algorithm proposed in Yuan and Zhang (2013) to approximate the global optimums
e 1 to (3.6). Here the cardinalities of the support sets of the leading eigenvectors are treated as tuning
u
parameters. The following three methods are considered:
lasso: the classical L1 penalized regression;
PCR: The sparse principal component regression using the sample covariance matrix as the suffi-
cient statistic and exploiting the truncated power algorithm in estimating u1 ;
RPCR: The robust sparse principal component regression proposed in this paper, using the mul-
tivariate Kendall’s tau as the sufficient statistic and exploiting the truncated power algorithm to
estimate u1 .
In this section, we conduct simulation study to back up the theoretical results and further investigate
the empirical performance of the proposed robust sparse principal component regression method.
To illustrate the empirical usefulness of the proposed method, we first consider generating the data
matrix X. To generate X, we need to consider how to generate ⌃ and ⇠. In detail, let !1 >
!2 > !3 = . . . = !d be the eigenvalues and u1 , . . . , ud be the eigenvectors of ⌃ with uj :=
(uj1 , . . . , ujd )T . The top 2 leading eigenvectors u1 , u2 of ⌃ are specified to be sparse with sj :=
p Pj 1 Pj
kuj k0 and ujk = 1/ sj for k 2 [1 + i=1 si , i=1 si ] and zero for all the others. ⌃ is generated
P2
as ⌃ = j=1 (!j !d )uj uTj +!d Id . Across all settings, we let s1 = s2 = 10, !1 = 5.5, !2 = 2.5,
and !j = 0.5 for all j = 3, . . . , d. With ⌃, we then consider the following four different elliptical
distributions:
d
(Normal) X ⇠ ECd (0, ⌃, ⇣1 ) with ⇣1 = d . Here d is the chi-distribution with degree of freedom
i.i.d. p d
d. For Y1 , . . . , Yd ⇠ N (0, 1), Y12 + . . . + Yd2 = d . In this setting, X follows the Gaussian
distribution (Fang et al., 1990).
d p d d
(Multivariate-t) X ⇠ ECd (0, ⌃, ⇣2 ) with ⇣2 = ⇠1⇤ /⇠2⇤ . Here ⇠1⇤ = d and ⇠2⇤ = with
2 Z . In this setting, X follows a multivariate-t distribution with degree of freedom (Fang
+
7
0.45
●
●
●
0.40
4
●●
●
●●
●●
0.35
●
●●●
●●
●
●●
●●
●●
●
●●
●
●
●
●●
●
Sample Quantiles
●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
0.30
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
lasso
●●
●
0
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
PCR
0.25
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
RPCR
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
0.20
−2
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●●●
●
●●●
●
0.15
●
● ●
−4
0.10
●
−6
0 50 100 150
−3 −2 −1 0 1 2 3
Acknowledgement
Han’s research is supported by a Google fellowship. Liu is supported by NSF Grants III-1116730
and NSF III-1332109, an NIH sub-award and a FDA sub-award from Johns Hopkins University.
8
References
Artemiou, A. and Li, B. (2009). On principal components and regression: a statistical explanation of a natural
phenomenon. Statistica Sinica, 19(4):1557.
Bunea, F. and Xiao, L. (2012). On the sample covariance matrix estimator of reduced effective rank population
matrices, with applications to fPCA. arXiv preprint arXiv:1212.5321.
Cai, T. T., Ma, Z., and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. The Annals of
Statistics (to appear).
Choi, K. and Marden, J. (1998). A multivariate version of kendall’s ⌧ . Journal of Nonparametric Statistics,
9(3):261–293.
Cook, R. D. (2007). Fisher lecture: Dimension reduction in regression. Statistical Science, 22(1):1–26.
Croux, C., Ollila, E., and Oja, H. (2002). Sign and rank covariance matrices: statistical properties and ap-
plication to principal components analysis. In Statistical data analysis based on the L1-norm and related
methods, pages 257–269. Springer.
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. (2007). A direct formulation for sparse
PCA using semidefinite programming. SIAM review, 49(3):434–448.
Fang, K., Kotz, S., and Ng, K. (1990). Symmetric multivariate and related distributions. Chapman&Hall,
London.
Han, F. and Liu, H. (2013a). Optimal sparse principal component analysis in high dimensional elliptical model.
arXiv preprint arXiv:1310.3561.
Han, F. and Liu, H. (2013b). Scale-invariant sparse PCA on high dimensional meta-elliptical data. Journal of
the American Statistical Association (in press).
Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high
dimensions. Journal of the American Statistical Association, 104(486).
Kendall, M. G. (1968). A course in multivariate analysis.
Klüppelberg, C., Kuhn, G., and Peng, L. (2007). Estimating the tail dependence function of an elliptical
distribution. Bernoulli, 13(1):229–251.
Lounici, K. (2012). Sparse principal component analysis with missing observations. arXiv preprint
arXiv:1205.7060.
Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. to appear Annals of Statistics.
Massy, W. F. (1965). Principal components regression in exploratory statistical research. Journal of the Amer-
ican Statistical Association, 60(309):234–256.
Moghaddam, B., Weiss, Y., and Avidan, S. (2006). Spectral bounds for sparse PCA: Exact and greedy algo-
rithms. Advances in neural information processing systems, 18:915.
Oja, H. (2010). Multivariate Nonparametric Methods with R: An approach based on spatial signs and ranks,
volume 199. Springer.
Ravikumar, P., Raskutti, G., Wainwright, M., and Yu, B. (2008). Model selection in gaussian graphical models:
High-dimensional consistency of l1-regularized mle. Advances in Neural Information Processing Systems
(NIPS), 21.
Tyler, D. E. (1987). A distribution-free m-estimator of multivariate scatter. The Annals of Statistics, 15(1):234–
251.
Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint
arXiv:1011.3027.
Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse pca in high dimensions. Journal of Machine
Learning Research (AIStats Track).
Yuan, X. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems. Journal of Machine
Learning Research, 14:899–925.
Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of computational
and graphical statistics, 15(2):265–286.