Relevance Vector Machines for Earthquake Response Spectra

2012 American Transactions on Engineering & Applied Sciences
2011 American Transactions on Engineering & Applied Sciences.

American Transactions on
Engineering & Applied Sciences

http://TuEngr.com/ATEAS, http://Get.to/Research

Relevance Vector Machines for Earthquake Response
Spectra
a* b
Jale Tezcan , Qiang Cheng

a
Department of Civil and Environmental Engineering, Southern Illinois University Carbondale,
Carbondale, IL 62901, USA
b
Department of Computer Science, Southern Illinois University Carbondale, Carbondale, IL 62901, USA

ARTICLEINFO A B S T RA C T
Article history: This study uses Relevance Vector Machine (RVM)
Received 23 August 2011
Received in revised form regression to develop a probabilistic model for the average horizontal
23 September 2011 component of 5%-damped earthquake response spectra. Unlike
Accepted 26 September 2011 conventional models, the proposed approach does not require a
Available online
26 September 2011 functional form, and constructs the model based on a set predictive
Keywords: variables and a set of representative ground motion records. The
Response spectrum RVM uses Bayesian inference to determine the confidence intervals,
Ground motion instead of estimating them from the mean squared errors on the
Supervised learning training set. An example application using three predictive
Bayesian regression variables (magnitude, distance and fault mechanism) is presented for
Relevance Vector Machines sites with shear wave velocities ranging from 450 m/s to 900 m/s.
The predictions from the proposed model are compared to an existing
parametric model. The results demonstrate the validity of the
proposed model, and suggest that it can be used as an alternative to
the conventional ground motion models. Future studies will
investigate the effect of additional predictive variables on the
predictive performance of the model.

2012 American Transactions on Engineering & Applied Sciences.

*Corresponding author ( J. Tezcan). Tel/Fax: +001-618-4536125. E-mail address:
jale@siu.edu. 2012. American Transactions on Engineering & Applied Sciences.
Volume 1 No.1 ISSN 2229-1652 eISSN 2229-1660. Online Available at
25
http://TUENGR.COM/ATEAS/V01/25-39.pdf

1. Introduction
Reliable prediction of ground motions from future earthquakes is one of the primary
challenges in seismic hazard assessment. Conventional ground motion models are based on
parametric regression, which requires a fixed functional form for the predictive model. Because the
mechanisms governing ground motion processes are not fully understood, identification of the
mathematical form of the underlying function is a challenge. Once a functional form is selected,
the model is fit to the data and the model coefficients minimizing the mean squared errors
between the model and the data are determined. This approach, when the selected mathematical
form does not accurately represent the actual input-output relationship, is susceptible to
overfitting. Indeed, using a sufficiently complex model, one can achieve a perfect fit to the
training data, regardless of the selected mathematical form. However, a perfect fit to the
training data does not indicate the predictive performance of the model for new data.

Kernel regression offers a convenient way to perform regression without a fixed parametric
form, or any knowledge of the underlying probability distribution. A special form of kernel
regression, called the Support Vector Regression (SVR) (Drucker et al., 1997) is characterized by
its compact representation and its high generalization performance. In SVR, the training data is
first transformed into a high dimensional kernel space, and linear regression is performed on the
transformed data. The resulting model is a linear combination of nonlinear kernel functions
evaluated at a subset of the training input. Combination weights are determined by minimizing a
penalized residual function. The SVR has proved successful in many studies since its introduction
in 1997. The effectiveness of SVR in ground motion modeling has been recently demonstrated
(Tezcan and Cheng, 2011), (Tezcan et al., 2010). A well-known weakness of the SVR is the lack
of probabilistic outputs. Although the confidence intervals can be constructed using the
mean-squared errors, similar to the approach used in conventional ground motion models, the
posterior probabilities, which produce the most reliable estimate of prediction intervals, are not
given. The lack of probabilistic outputs in the SVR formulation has motivated the development of
a new kernel regression model called Relevance Vector Machine (RVM) (Tipping, 2000) which
operates in a Bayesian framework.

To overcome the limitations of parametric regression while obtaining probabilistic

26 Jale Tezcan and Qiang Cheng

predictions, this paper proposes a new ground motion model based on the RVM regression.
Unlike standard ground motion models, which make point estimates of the optimal value of the
weights by minimizing the fitting error, the RVM model treats the model coefficients as random
variables with independent variances and attempts to find the model that maximizes the likelihood
of the observations. This approach offers two main advantages over the conventional ground
motion models. First, the prediction uncertainty is explicitly determined using Bayesian
inference, as opposed to being estimated from the mean squared errors. Second, the complexity of
the RVM model is controlled by assigning suitable prior distributions over the model coefficients,
which reduces the overfit susceptibility of the model.

The rest of the paper is organized as follows. In Section 2, the RVM regression algorithm is
described. Section 3 is devoted to the construction of ground motion model. Starting with the
description of the ground motion data and the predictive and target variables, the training results
are presented, and the prediction procedure for new data is described. Section 4 demonstrates
computational results and compares the RVM predictions to an existing empirical parametric
model. Section 5 concludes the paper by presenting the main conclusions of this study, and
discusses the advantages and limitations of the proposed method.

2. The RVM Regression Algorithm
Given a set of input vectors , 1: and corresponding real-valued targets , the
regression task is to estimate the underlying input-output relationship. Using kernel representation
(Smola and Schölkopf, 2004), the regression function can be written as a linear combination of a
set of nonlinear kernel functions:

, (1)

where , 1… are the combination weights and is the bias term.

27

This study uses the radial basis function (RBF) kernel:

, , , 0 (2)

where is the width parameter controlling the trade-off between model accuracy and
complexity. In this study, the width parameter has been determined using cross-validation.

Assuming independent noise samples from a zero-mean Gaussian distribution,
i.e., ~ 0, , the target values can be written as:

1, … , . (3)

Recast in matrix from, Equation (3) becomes:

Φw , (4)

where ,…, , ,…, , and Φ is an 1 basis matrix with 1
and , . The likelihood of the entire set, assuming independent observations is
given by:

| , 2 . (5)

where ,…, is the vector containing the mean values of the combination weights.
To control the complexity of the model, a zero-mean Gaussian prior is used where each weight is
assigned a different variance (MacKay, 1992):

| 0, 1/ . (6)


In Eq. (6), ,…, where 1/ is the variance of . The posterior distribution
of the weights is obtained as:

| , , 2 | | . (7)

where the mean vector and covariance matrix are:

(8)

(9)

with
… … 0
:
. (10)
0 …

The marginal likelihood of the dataset can be determined by integrating out the weights (MacKay,
1992) as follows:

| , 2 | | (11)

where and is the identity matrix of size . Ideal Bayesian inference
requires defining prior distributions over and , followed by marginalization. This process,
however, will not result in a closed form solution. Instead, the and values maximizing
Eq. (11) can be found iteratively as follows (MacKay, 1992):

29

1
(12)

. (13)
∑ 1

Because the nominator in Eq.(12) is a positive number with a maximum value of 1, an
value tending to infinity implies that the posterior distribution of is infinitely peaked at zero,
i.e. 0. As a consequence, the corresponding kernel function can be removed from the
model. The procedure for determining the weights and the noise variance can be summarized as
follows:

1) Select a width parameter of the kernel function and form the basis matrix Φ.
2) Initialize ,…, and .
3) Compute matrix using Eq.(10).
4) Compute the covariance matrix using Eq.(9).
5) Compute the mean vector using Eq.(8).
6) Update and using Eq.(12) and Eq.(13).
7) If ∞, set 0 and remove the corresponding column in Φ.
8) Go back to step 3 until convergence.
9) Set the remaining weights equal to .

The training input points corresponding to the remaining nonzero weights are called the
“relevance vectors”. After the weights and the noise variance are determined, the predictive mean
for a new input can be found as follows:

Φ.
(14)

T
In Eq.(14) Φ 1 x ,r x ,r … x , rN where r , r … , rN are the
relevance vectors.


The total predictive variance can be found by adding the noise variance to the uncertainty due
to the variance of the weights, as follows:

ΦT CΦ .
(15)

3. Construction of the Ground Motion Model
In this section, RVM regression algorithm will be used to construct a ground motion model. In
Section 4, the resulting model will be compared to an existing parametric model by Idriss (Idriss,
2008), which will be referred to as “I08 model” in this paper. To enable a fair comparison, the
dataset and the predictive variables of I08 model have been adopted in this study. The RVM
algorithm is independent of the size of the predictive variable set; additional variables can be
introduced the set of predictive variables can be customized to specific applications.

3.1 Ground Motion Data
The ground motion records used in the training have been obtained from the PEER-NGA
database (PEER, 2007). Consistent with the I08 model, a total of 942 free-field records have been
selected using the following criteria:

• Shear wave velocity at the top 30 m ranging from 450 m/s to 900 m/s,
• Magnitude larger than 4.5,
• Closest distance between the station and rupture surface (R) less than 200 km.

Detailed information regarding these records can be found in the paper by Idriss (Idriss, 2008).

3.2 Predictive and Target Variables
The predictive variable set includes moment magnitude (M), natural logarithm of the closest
distance between the station and the rupture surface in kilometers and fault mechanism (F).
Idriss finds that with the shear wave velocity ( ) constrained to 450 m/s- 900 m/s range, it has
31

negligible effect on spectral values up to 1 second. Therefore, was not used as a predictive
variable. Following the convention used in I08 model, earthquakes that have been assigned a fault
mechanism type 0 and 1 in the PEER database were merged to a single, “strike-slip” group, while
the rest were considered to be representative of “reverse” events. In the RVM model, strike-slip
and reverse earthquakes are assigned 1 and 1, respectively. The input vector
representing ith record has the following form:

. (16)

A set of eight vibration periods ( 8) ranging from 0.01 second to 4 seconds was used in
the RVM model. The output for the ith record for the vibration period is defined as:

for 1 to . (17)

In Equation (17), is the natural logarithm of the average horizontal component of 5%-
damped pseudo-acceleration response spectrum. The spectral values represent the median
value of the geometric mean of the two horizontal components, computed using non-redundant
rotations between 0 and 90 degrees (Boore, 2006).

3.3 Training of the RVM Regression Model

As a pre-processing step, and values were linearly scaled to [-1 1] to achieve
uniformity between the ranges of the predictive variables. There is no need to scale the fault
mechanism identifier ( as it was already defined to take either -1 or 1. Because kernel functions
use Euclidean distances between pairs of input vectors, such scaling will help prevent numerical
problems due to large variations between the ranges of the values that variables can take. In the
ground motion data used in this study, the ranges of the predictive variables are
4.53 7.68 , and 0.32 199.27 . Therefore, input scaling takes the
following form:


2 12.21 2 4.16
, , . (18)
3.15 6.44

The optimal value of the kernel width parameter for each vibration period was
determined using 10-fold cross validation (Webb, 2002). In 10-fold cross validation, the training
data is randomly partitioned into 10 subsets of equal size; and the model is trained using 9 subsets,
and the remaining subset is used to compute the validation error. This process is repeated 10 times,
each time with a different validation subset, and the average validation error for a particular is
computed. By computing the average validation error over a range of possible values, the
optimal with the smallest average validation error is determined. The resulting values for
each period are listed in Table 1, along with the standard deviation of noise ( ), the mean value of
the constant term and the number of relevance vectors. The relevance vectors and the
combination weights are listed in Table 2.

After the RVM models, one for each vibration period, were trained, standardized residuals
were computed. Figure 1 shows the distribution of the standardized residuals, corresponding to
T=1 second, with respect to , and . The residual distribution patterns for other periods
were similar, not indicating any systematic bias.

Table 1: Kernel width parameter , logarithmic standard deviation of noise ( ), mean value of
the bias term and the number of relevance vectors ( ), for each period.
T (sec)
0.01 0.23 0.633 -3.069 7
0.05 0.32 0.666 -0.664 7
0.10 0.13 0.718 0.002 7
0.20 0.15 0.661 -15.042 6
0.50 0.25 0.695 -8.359 7
1.00 0.36 0.748 -4.670 5
2.00 0.28 0.869 -6.0548 5
4.00 0.26 0.983 -7.794 5

33

Figure 1: Standardized residuals for T=1 second.

Table 2: Mean values of the combination weights and the relevance vectors

T=0.01 s. T=0.05 s.
i Wi ri i Wi ri
1 13.258 [-0.1937 0.2676 -1] 1 -6.177 [0.7905 -0.4227 1]
2 15.393 [0.5238 -0.2268 1] 2 6.355 [-0.3841 -0.1783 -1]
3 0.4861 [ 0.8921 0.9414 -1] 3 28.555 [0.5238 0.5856 1]
4 -5.073 [0.9619 -1.0000 1] 4 -7.930 [-0.5111 0.7896 -1]
5 -4.275 [0.9619 -0.6751 1] 5 -0.402 [0.7460 -0.4021 -1]
6 -14.173 [-0.2889 0.7862 -1] 6 -12.622 [0.9619 0.9545 1]
7 -8.086 [ 0.0603 0.9789 1] 7 -16.194 [0.0603 0.9789 1]

T=0.1 s. T=0.2 s.
i Wi ri i Wi ri
1 64.423 [0.4159 -0.1499 1] 1 29.569 [-0.8921 -0.0837 -1]
2 -6.991 [ 0.9619 0.9545 1] 2 2.293 [0.7905 -0.4227 1]
3 -36.297 [0.9619 -1.0000 1] 3 35.440 [0.8921 0.6543 -1]
4 15.875 [1.0000 0.4559 -1] 4 5.7412 [0.9619 -1.0000 1]
5 -5.599 [-0.3143 0.0809 1] 5 3.5036 [-0.8222 0.1385 1]
6 -17.361 [ 0.6508 0.9961 -1] 6 -48.496 [0.0603 0.4955 -1]
7 -25.799 [-0.1302 0.9056 1]


Table 2 (continued).

T=0.5 s. T=1.0 s.
i Wi ri i Wi ri
1 6.4551 [0.7905 -0.4227 1] 1 1.9699 [0.7905 -0.4227 1]
2 12.825 [-0.2317 -0.2931 -1] 2 4.8873 [0.0540 -0.2785 -1]
3 0.0283 [-0.7714 0.1214 1] 3 -4.1425 [-0.7524 0.7892 1]
4 -0.806 [ 0.8921 -0.0318 -1] 4 -3.9593 [-0.7651 0.8672 -1]
5 8.4335 [0.8921 0.9414 -1] 5 3.7352 [-0.1302 -0.0121 1]
6 -0.089 [ 0.9619 0.9545 1]
7 -12.9 [ 0.0603 0.5786 -1]

T=2.0 s. T=4.0 s.
i Wi ri i Wi ri
1 7.3574 [-0.2317 -0.2931 -1] 1 0.4747 [0.7460 -0.4021 -1]
2 4.5548 [-0.0730 0.4691 1] 2 11.936 [0.7460 0.5118 -1]
3 3.0086 [ 0.9619 -1.0000 1] 3 6.8109 [0.3714 -0.0296 1]
4 -6.4695 [-1.0000 0.5142 -1] 4 -5.6050 [-0.7524 0.7892 1]
5 -5.3630 [-0.7524 0.7892 1] 5 -10.180 [0.3778 1.0000 -1]

3.4 Prediction Phase
After training, the spectral values for a new input vector , , can be determined
as follows:

1. Scale the input to the range [-1 1] using Eq. (18);
T
2. Construct the basis vector Φ 1 , , … , using the
relevance vectors from Table 2 and the kernel width parameter from Table 1;
3. Determine the median value of using Eq.(14);
4. Obtain the standard deviation of the noise from Table 1. Total uncertainty, if needed, can
be determined using Eq.(15).

4. Computational Results
The RVM model was tested using different magnitude, distance and fault mechanisms, and the
results were compared to the I08 model. Figure 2 shows the median spectral acceleration at T=1

35

second, along with the 16th and 84th percentile values bounds for strike-slip faults, for
M=5 (left) and M=7 (right). The circles in the figure show the spectral values from earthquakes
with the same fault mechanism and within ±0.25 magnitude units. Figure 3 shows the same
information for reverse faults. For periods about 1 second and longer, it was observed that the
median estimates from the RVM model were generally lower than those from the I08 model. At
very short distances, within ~20 km of the source, RVM estimates were higher for M=7, for both
strike-slip and reverse faulting earthquakes.

Figure 2: Median ±σ bounds for spectral acceleration at T=1 second, strike-slip faults.

Figure 3: Median ±σ bounds for spectral acceleration at T=1 second, reverse faults.

Figure 4 presents the results for vibration period T=0.2 second, for strike-slip earthquakes.
The results for the reverse faulting earthquakes were similar. For shorter vibration periods, and
M=7, RVM estimates were lower than those from the I08 model. For M=5, however, RVM
predictions equaled or exceed the I08 predictions. Regarding the variation about the median (noise
variance), the predictions from the two models were in general agreement for all vibration periods.

Figure 4: Median ±σ bounds for spectral acceleration at T=0.2 second, strike-slip faults.

5. Conclusion
This paper proposes an RVM-based model for the average horizontal component of
earthquake response spectra. Given a set of predictive variable set, and a set of ground motion
records, the RVM model predicts the most likely spectral values in addition to its variability. An
example application has been presented where the predictions from the RVM model have been
compared to an existing, parametric ground motion model. The results demonstrate the validity of
the proposed model, and suggest that it can be used as an alternative to the conventional ground
motion models.

The RVM model offers the following advantages over its conventional counterparts: (1) There
is no need to select a fixed functional form. By determining the optimal variances associated with

37

the weights, the RVM automatically detects the most plausible model; (2) The resulting RVM
model has a simple mathematical structure (weighted average of exponential basis functions), and
is based on a small number of samples that carry the most relevant information. Samples that are
not well supported by the evidence (as measured by the increase in the marginal likelihood) are
automatically pruned. (3) Because the model complexity is controlled during the training stage, the
RVM has lower risk of over-fitting.

One limitation of the proposed approach is that the resulting model may be difficult to
interpret. Because the RVM is not a physical model, it does not allow any user-defined, physical
constraints, not allowing extension of the model to scenarios not represented in the training data
set. However, in our opinion, this does not constitute a shortcoming, considering that the reliability
such practice is questionable in any regression model. Another potential limitation is that the RVM
requires a user-defined kernel width parameter, which does not have a very clear intuitive meaning,
especially when working with high dimensional input vectors. However, the optimal value of the
kernel width parameter can be determined using cross-validation, as has been done in this study.
Future studies will investigate the effect of using additional predictive variables on the
performance of the model.

6. Acknowledgements
This material is based in part upon work supported by the National Science Foundation under
Grant Number CMMI-1100735.

7. References
Boore, D.M., J. Watson-Lamprey, and N.A. Abrahamson. (2006). Orientation-independent
measures of ground motion. Bulletin of the Seismological Society of America, 96(4A),
1502-1511.

Bozorgnia, Y. and K. W. Campbell. (2004). The vertical-to-horizontal response spectral ratio and
tentative procedures for developing simplified V/H and vertical design spectra. Journal of
Earthquake Engineering, 8(2), 175-207.

Campbell, K. W. and Y. Bozorgnia. (2003). Updated Near-Source Ground-Motion (Attenuation)
Relations for the Horizontal and Vertical Components of Peak Ground Acceleration and
Acceleration Response Spectra. Bulletin of the Seismological Society of America, 93(1),
314-331.

Drucker, H., C. J. C. Burges, L. Kaufman, A. Smola and V. Vapnik. (1997). Support vector
regression machines, Advances in Neural Information Processing Systems 9, MIT Press.

Idriss, I. M. (2008). An NGA empirical model for estimating the horizontal spectral values
generated by shallow crustal earthquakes. Earthquake spectra, 24(1), 217-242.

MacKay, D. J. C. (1992). Bayesian interpolation. Neural computation, 4(3), 415-447.

MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural
Computation, 4(5), 720-736.

PEER. (2007). PEER-NGA Database. http://peer.berkeley.edu/nga/index.html.

Smola, A. J. and B. Schölkopf. (2004). A tutorial on support vector regression. Statistics and
Computing, 14(3), 199-222.

Tezcan, J. and Q. Cheng. (2011). A Nonparametric Characterization of Vertical Ground Motion
Effects. Earthquake Engineering and Structural Dynamics (in print).

Tezcan, J., Q. Cheng and L. Hill. (2010). Response Spectrum Estimation using Support Vector
Machines, 5th International Conference on Recent Advances in Geotechnical Earthquake
Engineering and Soil Dynamics, San Diego, CA.

Tipping, M. (2000). The relevance vector machine. Advances in Neural Information Processing
Systems MIT Press.

Webb, A. (2002). Statistical pattern recognition, New York, John Wiley and Sons.

Dr.Jale Tezcan is an Associate Professor in the Department of Civil and Environmental
Engineering at Southern Illinois University Carbondale. She earned her Ph.D. from Rice University,
Houston, TX in 2005. Dr.Tezcan’s research interests include earthquake engineering, material
characterization, and numerical methods.

Dr.Qiang Cheng is an Assistant Professor in the Department of Computer Science at Southern
Illinois University Carbondale. He earned his Ph.D. from the University of Illinois at Urbana
Champaign, IL in 2002. Dr.Cheng’s research interests include pattern recognition, machine
learning and signal processing.

Peer Review: This article has been internationally peer-reviewed and accepted for publication
according to the guidelines given at the journal’s website.

39

Relevance Vector Machines for Earthquake Response Spectra

More Related Content

Relevance Vector Machines for Earthquake Response Spectra