1 s2.0 S0377042714002775 Main
1 s2.0 S0377042714002775 Main
1 s2.0 S0377042714002775 Main
Keywords:
Multiple linear regression models
Parallel computing
Maximum likelihood estimator
Consistency
Outlier
1. Introduction
Multiple linear regression models (MLRMs) are widely used in many statistical problems. Several parallel methods and
specified criterion of chosen subsets have been proposed in recent years (see [1]) to improve run-time and computational
effect. We provide a brief overview of parallel methods that are useful for solving the MLRMs.
Mitchell and Beauchamp [2] created a parallel method for the subset selection problem using a Bayesian perspective.
Havránek and Stratkoš [3] considered parallel methods for the Cholesky factorization in multiple linear models, and showed
that the methods’ performance is independent of the size of data sets.
Xu et al. [4] suggested a form of stochastic domain decomposition in multiple linear models to improve performance and
to resist processor failure. The general concept of domain decomposition is decomposing the data so that the processors
have data sets of nearly the same sizes and computation times. The importance of size during parallel communication was
similarly considered.
Skvoretz et al. [5] experimented with MLRMs in social science research. The parallel component of their computation was
computing for a covariance matrix through a single program multiple data stream. Different numbers of processors were
∗ Corresponding author at: Department of Statistics, Shandong University of Technology, Zibo 255000, China.
E-mail address: ggb11111111@163.com (G. Guo).
http://dx.doi.org/10.1016/j.cam.2014.06.005
0377-0427/© 2014 Elsevier B.V. All rights reserved.
252 G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263
used in the experiments, and the varying amounts of data were read from a disk. They found that the latter consideration
was critical in obtaining good performance.
Bouyouli et al. [6] developed global minimal and global orthogonal residual methods for MLRMs, all of which were block
Krylov subspace methods of parallel methods.
MLRMs allow for a highly effective parallel implementation, elegantly illustrating our point and encouraging further
development in theory and application. This work originates from the statistical analysis of multiple linear models [7] in
statistical tests and from several examples of parallel maximum likelihood estimator (PMLE). Properties of stochastic domain
decomposition were studied for the maximum likelihood estimator (MLE) in multiple linear models.
We provide a general MLE in multiple linear models. Suppose that MLRMs have the following form:
Y = X β + ε, ε ∼ N (0, σ 2 I ), (1.1)
where X ∈ R is a known matrix of fixed rank, rank(X ) = p, p ≪ n, Y ∈ R
n×p
is an observable random vector, β ∈ Rp×1
n×1
is a vector of unknown parameters, I ∈ Rn×n is a known unit matrix, and σ 2 is a positive unknown parameter.
The MLE is often used to estimate unknown parameters in multiple linear models. The MLE of β under the model (1.1)
is defined as
β̂ = arg min(Y − X β)T (Y − X β). (1.2)
β
We then have
β̂ = (X T X )−1 X T Y . (1.3)
The PMLE method in (1.1) is as follows: first, (X , Y ) is sent to the r processor respectively; second, different elements of
(X , Y ) are acquired by stochastic domain decomposition in each processor, denoted as (Xi , Yi ); the MLE is then computed,
the PMLE is obtained using the estimator in each processor. The PMLE method is a domain decomposition method, and has
a short run-time on large data sets. Although there are a number of existing methods for MLRM, the method is more faster,
and more robust in some cases.
We organized the rest of this paper as follows. In Section 2, we introduce the PMLE method (see [8]), and provides an
equivalence condition of the PMLE and a generalized least squares estimator. In Section 3, we consider first the rank of pro-
jections in the PMLE method, followed by the eigenvalue. We study the consistency of the PMLE in Section 4. In Section 5, we
illustrate the method through several experiments studies, including those on consistency, outlier and scalability. Experi-
ments with bankruptcy data are also provided. Section 6 discusses future research. The Appendix lists the technical results.
2. PMLE of MLRMs
In this section, we introduce the matrix form of the proposed PMLE in (1.1). We assume that Xi (i = 1, . . . , r ) are the
subsamples of the observed sample X , where X ∈ Rn×1 . Write
Xi = R i X , Ei = RTi Ri ,
Ei = diag{α1 , α2 , . . . , αn }, rank{Ei } = n0 ≥ p, i = 1, . . . , r . (2.1)
Here, Ri is the projection operator. i=1 αi = n0 , αi ∼ B(n0 , 1/n), E (αi ) = n0 /n. Note that
n
r
I ≤ Ei ≤ qI . (2.2)
i=1
where rank( Erj ) = n, l < r, we obtain the following equivalence proposition of the PMLE and the GLS estimator.
l
j =1
Proposition 2.1. When {Erj , j = 1, . . . , l} satisfies Eq. (2.6), the PMLE of MLRMs is a GLS estimator.
In this section, we investigate the properties of PMLE such that the rank of the projections is {Pi = X (X T Ei X )− X T Ei , i =
1, 2, . . . , r }, as well as the eigenvalue bounds of {XiT Xi ± X T (I − Ei )X , i = 1, 2, . . . , r }.
The following is a theorem on the rank of projections for the PMLE in multiple linear models.
Theorem 3.1. Assuming that the projections are {Pi = X (X T Ei X )− X T Ei , i = 1, 2, . . . , r } in (2.2), we then have
max{rank(Pi )} = rank(X );
i
XT X = P .. P ,
T
.
λ2p
254 G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263
ν12
XiT Xi = P̃ .. P̃ ,
T
.
νp2
XiT Xi + V T V = X T Ei X + X T (I − Ei )X = X T X .
ν − λmax (V V ) ≤ λ
2
1
T T
(
min Xi Xi − V V) ≤ ν .
T 2
1 (3.2)
We assign y = (y1 , . . . , yp , )T , yyT = V T V and y1 ≤ · · · ≤ yp . Defining the projections of the vector y onto the eigenvec-
tors of XiT Xi ,
yi:j = (pi , . . . , pj )T y, 1 ≤ i ≤ j ≤ p.
For a fixed Ei , the smallest and largest eigenvalues are bound in terms of projections onto the two-dimensional subspaces
below.
ν12 y21
0 y1 ∥y2:p ∥
L± = ± ,
0 ν22 y1 ∥y2:p ∥ ∥y2:p ∥2
ν1
2
y21
0 y1 y2
U± = ± ,
0 ν22 y1 y2 y22
ν12 ≤ λmin (L+ ) ≤ λmin (U+ ), ν12 − µp ≤ λmin (L− ) ≤ λmin (U− ) ≤ ν12 .
νp2 y21
0 y1 y2
L± = ± ,
0 νp2−1 y1 y2 y22
νp2 y21
0 y1 ∥y2:p ∥
U± = ± .
0 νp−1 22 y1 ∥y2:p ∥ ∥y2:p ∥2
Then, λmax (L± ) ≤ λmax (XiT Xi ± V T V ) ≤ λmax (U± ), where
νp2 ≤ λmax (L+ ) ≤ λmax (U+ ), νp2−1 ≤ λmax (L− ) ≤ λmax (U− ) ≤ νp2 .
E ∥β̃ − β∥
P ∥β̃ − β∥ ≥ ε ≤ ,
ε
G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263 255
∥β̂ − β∥ ≤ ϵ p · ∥β∥
is satisfied, where β is the true solution, β̂ is the computed solution, p is the rank of X , and ε > 0.
We initially require a preliminary lemma with fixed Ei , which is due to Stewart.
Theorem 4.2. Considering the PMLE of MLRMs in (1.1), we denote the following:
Qi = (X T Ei X )− X T Ei , Pi = X (X T Ei X )− X T Ei = XQi , i = 1, . . . , r .
Then constants CX and C̄X exist such that
∥Qi ∥ ≤ CX ; ∥Pi ∥ ≤ C̄X , i = 1, . . . , r .
Theorem 4.3. For β̂i in (2.3) and β̂ in (1.3), β̂ is the stable solution of (1.1). Assume that
∥(X T Ei X )− X T Ei − (X T X )−1 X T ∥ ≤ ϵ · ci · ∥β∥, i = 1, . . . , r ,
where ci is constant. Then,
∥β̃ − β∥ r
≤ ϵ · p + ∥Y ∥ · ci .
∥β∥ i=1
Corollary 4.4. For β̂i in (2.3) and β̂ in (1.3), β̂ is the stable solution of (1.1). Assume that
∥(X T Ei X )− X T Ei − (X T X )−1 X T ∥ ≤ ϵ · ∥β∥, i = 1, . . . , r .
Then
∥β̃ − β∥
≤ ϵ · (p + ∥Y ∥).
∥β∥
Now, we obtain a theorem and a corollary for the error bound as follows.
Theorem 4.5. For β̂i in (2.3) and β̂ in (1.3), β̂ is the stable solution of (1.1). If
∥Y − X β∥ ≤ ϵ · ∥X ∥ · ∥β∥,
then
∥β̃ − β∥
≤ ϵ[(1 + p)∥(X T Ei X )− ∥ · ∥X ∥2 + p].
∥β∥
Corollary 4.6. For β̂i in (2.3) and β̂ in (1.3), β̂ is the stable solution of (1.1). If
∥Y − X β∥ ≤ ϵ · ∥X ∥ · ∥β∥,
256 G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263
Table 1
Consistency check with n0 = 32, p = 8, r = 16.
n εrn Time cost (s) Iterations
then
∥β̃ − β∥
≤ ϵ[(1 + p)∥(X T Ei X )− ∥ · ∥X ∥2 + p + CX · ∥X ∥].
∥β∥
In this section, we validate the effectiveness of the PMLE method through experiments. The experiments were deployed
on a 16-node Beowulf cluster, each with dual 3 GHz inter CPUs and a 3 GB memory. We ran the R program and the Rmpi in
Redhat Linux 6.2 using MPICH2 and SNOW implementation. We design a series of consistency experiments to confirm that
the PMLE method was working correctly. We then detected the ratios of the outliers in the MLRMs, using our PMLE method.
In addition, we tested the scalability performance of the method, which was then used to fit with an actual large data set.
The theorems in Section 4 show that the PMLE is consistent for MLRMs. We let
r
1
εrn = εrn (X1 , . . . , Xr ) = ∥β̂i − β∥,
r i=1
which is an average error bound. When r is fixed, a decrease in εrn for increasing n indicates that consistency holds, which
proves that the PMLE yields a consistent solution.
The problem size is determined by the following dimensions: the sample size n, subsample size n0 , number of parameters
(0) (0)
p, and the number of MPI processes r. With an initial solution of β (0) = (β1 , . . . , βp ), we reached the selected routine
runs and obtained a PMLE. β̃ is obtained by β̂i (i = 1, . . . , r ).
Experiment samples were generated and stored for each distinct setting of (n, n0 , p), for the maximum number of r,
ensuring that any two runs with the same dimensions (n, n0 , p, r) used exactly the same sample. Thus, their results were
directly comparable. All runs were distributed in the same manner across the cluster.
Table 1 demonstrates the consistency of the PMLE methods, with n0 , p, r fixed and n varying from a moderate to a large
sample size. The value of εrn is displayed, along with the number of PMLE iterations and time cost (in seconds). The time costs
include reading data, setting up problem, initial communication. Clearly, εrn decreases as the sample size n increases. There-
fore, the computed estimates appear consistent. Furthermore, the time cost roughly doubles as the sample size doubles,
whereas the number of PMLE iterations slowly increases.
Table 2 shows the results that are similar to those in Table 1, but with n, n0 , p fixed and r changing individually. The
quality of solutions is shown as dimensions that are varied. An increase in r causes the expected increase in time cost. In
Table 2(a), εrn also decreases as the subsample size n0 increases. The time cost and the number of iterations increase for
a large n0 . Table 2(b) shows a dramatic increase in computational time as p doubles. The number of PMLE iterations also
increases, whereas εrn appears to be unaffected. Table 2(c) shows that changing r does not significantly affect the solutions.
If n = 128 and the subsample size is also 128, then it does not make sense to use 16 MPI processes.
The general MLEs were slightly less precise in the presence of outliers (an outlier is an observation that is numerically
distant from the rest of the data). Thus, studying the ratio of outliers is necessary to reduce the outliers. In this study, the
number of parameters p, and MPI processes r were fixed. If Nn0 is the number of different outliers in the subsample, then
Nn is the number of different outliers in the sample. The ratio of outliers is ρN = Nn0 /Nn . Tables 3–5 provide the ratios of
outliers for the different outlier percentages on p = 10 and r = 8.
In Tables 3–5, the PMLE methods have low outlier ratios with the large sample sizes and the small subsample sizes, but
compromise outlier detection with the small sample sizes and the large subsample sizes. In this case, the optimal relation-
ship between n and n0 is provided. Tables 3–5 show that the optimal relation bounds are n/n0 ∈ [8, 32], n/n0 ∈ [8, 64] and
G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263 257
Table 2
Solution quality varying n0 , p, r.
(a) Vary n0 , using n = 128, p = 32, r = 16
n0 εrn Time cost Iterations
1 5.95 × 10 −1
0.02 129
2 4.87 × 10−1 0.04 113
4 3.98 × 10−1 0.07 109
8 2.76 × 10−1 0.15 117
16 1.53 × 10−1 0.32 121
32 9.83 × 10−2 0.63 127
64 6.79 × 10−2 1.19 136
128 3.68 × 10−2 2.41 147
(b) Vary p, using n = 256, n0 = 64, r = 16
p εrn Time cost Iterations
1 5.58 × 10 −3
0.09 194
2 5.72 × 10−3 0.15 193
4 6.08 × 10−3 0.29 196
8 6.10 × 10−3 0.54 195
16 5.79 × 10−3 1.07 195
32 5.61 × 10−3 2.11 196
Table 3
5% outlier of Y ’s in MLRMs (ρN ).
n n0
5 10 20 40 80 160 320
20 0 1 1 – – – –
40 0.50 0.50 1 1 – – –
80 0.50 0.50 0.75 0.75 1 – –
160 0.25 0.38 0.50 0.75 0.87 1 –
320 0.13 0.23 0.31 0.45 0.73 0.87 1
640 0.06 0.13 0.23 0.35 0.46 0.75 0.97
1280 0.03 0.13 0.16 0.26 0.38 0.45 0.78
2560 0.02 0.06 0.13 0.18 0.24 0.37 0.47
Table 4
10% outlier of Y ’s in MLRMs (ρN ).
n n0
5 10 20 40 80 160 320
20 1 1 1 – – – –
40 0.50 0.50 1 1 – – –
80 0.50 0.50 0.75 0.75 1 – –
160 0.25 0.38 0.50 0.75 0.87 1 –
320 0.25 0.31 0.45 0.50 0.75 0.94 1
640 0.13 0.23 0.34 0.46 0.52 0.78 0.97
1280 0.06 0.16 0.26 0.38 0.45 0.51 0.89
2560 0.03 0.07 0.15 0.24 0.37 0.47 0.52
n/n0 ∈ [16, 64], respectively. From the change of n and n0 , the PMLE methods are the effective methods for outlier detection
in the samples (an outlier detection technique can also be perceived as testing if an instance is generated by that model or
not).
We consider the parallel performance of the PMLE problem when varying the number of MPI processes r. We examined
the time cost and the metrics’ ‘‘efficiency’’, which is conventionally provided in performance studies. Let c ∈ {n, n0 , p} be
258 G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263
Table 5
15% outlier of Y ’s in MLRMs (ρN ).
n n0
5 10 20 40 80 160 320
20 1 1 1 – – – –
40 0.50 0.50 1 1 – – –
80 0.50 0.50 0.75 0.75 1 – –
160 0.25 0.38 0.50 0.75 0.87 1 –
320 0.25 0.38 0.48 0.50 0.82 0.94 1
640 0.18 0.23 0.38 0.46 0.52 0.78 0.97
1280 0.09 0.19 0.29 0.38 0.48 0.54 0.81
2560 0.05 0.09 0.18 0.26 0.38 0.49 0.55
Table 6
Time cost and efficiency for the PMLE of MLRMs.
(a) Time cost (s)
Processing nodes (r) 1 2 4 8 16
Table 7
Multiple R-squared of ‘‘lm’’ function with chosen subset in Bank32nh.
an experiment variable under observation. Define Tr (c ) as the time cost in seconds to compute a problem of size c using
r processes. The speedup is defined as Sr (c ) = T1 (c )/Tr (c ), where the Sr (c ) close to r suggests an ideal parallel perfor-
mance. Efficiency is defined as Er (c ) = Sr (c )/r, where Er (c ) close to 1 suggests an ideal performance. The same sample was
used whenever c was constant and when the number of processes r varied. This step helps simplify comparisons between
different r. Parallel runs ensure that the results matched.
The samples were generated from the MLRM Y1 = β1 x1 + β2 x2 + β3 x3 for our first simulation. In another simulations, the
l=1 βl xl ; Y3 = l=1 βl xl . Here xl ∼ N (µl , σl ). µl , σl are different. βl (l = 1, . . . , 10)
10 50 2 2
samples were generated from Y2 =
are unknown parameters.
Table 6 reports the execution time using the sample sizes of n = 1.6 × 106 . On one hand, the subsample size n0 = 50000
obtained a speed-up that is approximately linear to the number of our available nodes. On the other hand, an important
factor is the rank of X , namely, p, execution time increases when p increases. In addition, increasing the number of nodes r,
one can decrease the time cost for large data sets.
Table 6 shows the results of the simulations with varying r. For each fixed n, increasing the number of processes r strongly
affects the time cost. Particularly, there is a nice result in p = 50.
In Table 6(a), for p = 3, 10 and 50, doubling r almost halves the time cost. As r increases, the run time significantly
decreases. Thus, our method significantly reduces the time cost for large enough dimension sizes, especially p = 50.
Table 6(b) shows the efficiency when r varies and when p = 3, 10 and 50. Time cost is indefinitely reduced by half as r
doubles. The optimal values are r = 4 for p = 3, r = 8 for p = 10 and r = 16 for p = 50.
We examined whether the advantages of the proposed method were still valid under actual data sets. For this purpose,
we use a sample bankruptcy data from [9]. In this data set, Bank32nh included 4500 observed samples, with 31 continuous
attributes and two-dimensional output values (mxql and rej). MLE was obtained by fitting Bank32nh using the function lm in
R software. The multiple R-squared was 0.4156, the F-statistic was 102.5 on 31, and 4468 DF about mxql. Subsets of Bank32nh
were then selected to examine the parallel maximum likelihood. Let r = 7 and rank(Ei ) = n0 = 51 in Eq. (2.3), where the
matrices Ri are fixed. If we choose r the another more than 7, for example, r = 16, we have the same result with Table 6
G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263 259
in time cost. Table 7 shows a series of multiple R-squared values of these subset maximum likelihoods and an estimator of
mxql.
mxql = 4.61749 − 0.07253a1cx + 0.2806a1cy − 0.2821a1sx + 0.1796a1sy + 0.15413a1rho
− 0.04706a1pop + 0.02655a2cx − 0.05268a2cy − 0.05536a2sx + 0.1238a2sy
+ 0.3376a2rho − 0.03203a2pop + 0.20188a3cx + 0.2771a3cy − 0.05074a3sx
− 0.1831a3sy + 0.38922a3rho + 0.07905a3pop − 0.3564temp + 0.05736b1x
+ 0.2942b1y + 0.25223b1call − 0.24556b1eff − 0.1871b2x + 0.034925b2y + 0.282143b2call
− 0.18141b2eff − 0.25519b3x − 0.23129b3y + 0.312347b3call + 0.054851b3eff .
We obtained the above PMLE of mxql for 31 attributes, which was a weighted least-squares estimator for these subset
estimators with 1/7 as their weights. The statistical property of our estimator was the same as that of the WLS estimator. At
the same time, Table 7 indicates that, each subset estimator multiple R-squared is larger than the multiple R-squared value
of MLE.
A suitable subset can be found if we have a PMLE of mxql for the 4500 observed samples. In particular, we can choose
r = 1, rank(Ei ) = 51 and subset = 350:400. Thus, the multiple R-squared value of our method is 0.861, which is much
larger than that of the population. It is seen that the effective of the method is related to the chosen subset.
This paper provided the PMLE method in MLRMs as well as several of its related properties and computational efficiency.
Effective estimation and run-time for large data sets were achieved using the methods to fit these models.
The PMLE method is effective in computing the MLRMs. The effects of adjusting sample size, subsample size, and the
number of parameters were studied through simulations. Increasing the sample size verified the consistency of the PMLE
methods. An increase in the number of parameters quickly increased the run-time. Increasing the subsample size similarly
increased the run-time and improved the quality of our methods. Varying the sample and subsample sizes showed excellent
performance. Despite the increased run-time, a high number of parameters allowed for optimum performance. Therefore,
the PMLE method with an extremely high number of parameters would be infeasible to maximize, even on a large cluster.
In addition, outlier experiments were presented to obtain the optimal relationship between the sample and the subsam-
ple sizes. Finally, the PMLE method was used in a more realistic analysis, yielding a significant improvement in performance
using only a few computation nodes. The method can be applied to statistical computations in general; in particular, we
computed the PMLE method for the MLRMs.
For future research, we will study other asymptotical properties of the PMLE method with a working matrix, tolerance
region, choice of block length in the novel method, case in Wishart matrix, computational cost, and so on. The best subset
can be selected using certain rules, such as AIC and BIC, which are important aspects in future study.
Acknowledgments
We would like to thank Prof. M. J. Goovaerts for some useful help and suggestions. This work was supported by the NSFC
under grant 10921101, 91130003, 11171189 and 11326183, China Postdoctoral Science Foundation (135569), the NBSC
under grant 2012LY017, and Shan-dong Natural Science Foundation (ZR2011AZ002).
then
−1
r l l
1
β̃ = (X Ei X ) X Ei Y = X
T − T T
Erj X X T
Erj Y = (X T Σ0−1 X )−1 X T Σ0−1 Y .
r i=1 j=1 j =1
Thus we get the equivalence proposition. The above method is a WLS estimator since Σ0 is a diagonal matrix.
260 G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263
X (X T Ei X )− X T Ei
X
rank(X (X Ei X ) X Ei ) = rank
T − T
− rank(Ei X )
0 Ei X
0 X
= rank − rank(Ei X )
Ei X (X Ei X )− X T Ei
T
0
= rank(X ),
and
min{rank(Pi )} = min{rank(X (X T Ei X )− X T Ei )}
i i
= min{rank(Ei X (X T Ei X )− X T Ei )}.
i
and
I 0 0 0
= .
0 Ei X (X T Ei X )− T
X Ei Ei X (X Ei X )− X T Ei
T
(X T X )z = λmin (X T X )z , ∥z ∥ = 1.
p1 z
z1
= = P̃ T z ,
z2:p T
P̃1 z
then
λmin (X T X ) = z T (X T X )z = z2T:p Λ1 z2:p + ν12 |z1 |2 + |yT z |2
≥ ν22 ∥z2:p ∥2 + ν12 |z1 |2 + |z2:p y2:p + z1 y1 |2
ν12
0 y1 z1
.
z
= 1 2:p z + y1 y2:p
0 ν22 In−1 y2:p z1:p−1
G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263 261
z1
Let Q be a matrix of order p − 1 such that Qy2:p = ∥y2:p ∥ep−1 and set w = Qz2:p
, where ∥w∥ = 1. Then
ν12
0 y1
λmin (X T X ) ≥ w T yT2:p w
+ y1
0 ν2
2 Ip−1
y2:p
L+ 0
≥ λmin = min{ν22 , λmin (L+ )}.
0 ν22 In−2
Applying (3.1) to L+ gives ν12 ≤ λmin (L+ ) ≤ ν22 , and
Proof of Theorem 4.3. According to Wilkinson [18], β̂ is the stable solution, then ∥β̂ − β∥ ≤ ϵ · p · ∥β∥. We have
Thus
∥β̃ − β∥ r
≤ ϵ · p + ∥Y ∥ · ci ,
∥β∥ i =1
as claimed.
References
[1] B. Eksioglu, R. Demirer, I. Capar, Subset selection in multiple linear regression: a new mathematical programming approach, Comput. Ind. Eng. 49
(2005) 155–167.
[2] T.J. Mitchell, J.J. Beauchamp, Bayesian variable selection in linear regression, J. Amer. Statist. Assoc. 83 (1988) 1023–1032.
[3] T. Havránek, Z. Stratkoš, On practical experience with parallel processing of linear models, Bull. Int. Statist. Inst. 53 (1989) 105–117.
[4] M. Xu, E. Wegman, J. Miller, Parallelizing multiple linear regression for speed and redundancy: an empirical study, J. Stat. Comput. Simul. 39 (1991)
205–214.
[5] J. Skvoretz, S. Smith, C. Baldwin, Parallel processing applications for data analysis in the social sciences, Concurrency, Pract. Exp. 4 (1992) 207–221.
[6] R. Bouyouli, K. Jbilou, R. Sadaka, H. Sadok, Convergence properties of some block Krylov subspace methods for multiple linear systems, J. Comput.
Appl. Math. 196 (2006) 498–511.
[7] E. Wegman, On Some Statistical Methods for Parallel Computation, in: Statistics Textbooks and Monographs, vol. 184, 2006, pp. 285–306.
[8] G. Guo, Parallel statistical computing for statistical inference, J. Stat. Theory Pract. 6 (2012) 536–565.
[9] D.P. Foster, A.S. Robert, Variable selection in data mining: building a predictive model for bankruptcy, J. Amer. Statist. Assoc. 99.466 (2004) 303–313.
G. Guo et al. / Journal of Computational and Applied Mathematics 273 (2015) 251–263 263
[10] R. Coppi, P. D’Urso, P. Giordani, A. Santoro, Least squares estimation of a linear regression model with LR fuzzy response, Comput. Statist. Data Anal.
51 (2006) 267–286.
[11] G. Guo, Schwarz methods for quasi-likelihood in generalized linear models, Comm. Statist. Simulation Comput. 37 (2008) 2027–2036.
[12] G. Guo, S. Lin, Schwarz method for penalized quasi-likelihood in generalized additive models, Comm. Statist. Theory Methods 39 (2010) 1847–1854.
[13] G. Guo, W. Zhao, Schwarz methods for quasi stationary distributions of Markov chains, Calcolo 49 (2012) 21–39.
[14] M. Schervish, Applications of parallel computation to statistical inference, J. Amer. Statist. Assoc. 83 (1988) 976–983.
[15] A. Stamatakis, RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models, Bioinformatics 22 (2006)
2688–2690.
[16] S.J. Steel, D.W. Uys, Influential data cases when the Cp criterion is used for variable selection in multiple linear regression, Comput. Statist. Data Anal.
50 (2006) 1840–1854.
[17] Y. Tian, Y. Takane, Some properties of projectors associated with the WLSE under a general linear model, J. Multivariate Anal. 99 (2008) 1070–1082.
[18] J.H. Wilkinson, The Algebraic Eigenvalue Problem (Vol. 87), Clarendon Press, Oxford, 1965.