SVM Quantile DNA
SVM Quantile DNA
SVM Quantile DNA
Support Vector Machine Quantile Regression for Detecting Differentially Expressed Genes in Microarray Analysis
I. Sohn1, S. Kim2, C. Hwang3, J. W. Lee1, J. Shim4 Department of Statistics, Korea University, Seoul, Korea 2 Skin Research Institute, AmorePacific R&D Center, Kyounggi-do, Korea 3 Division of Information and Computer Sciences, Dankook University, Kyounggi-do, Korea 4 Department of Applied Statistics, Catholic University of Daegu, Kyungbuk, Korea
1
Objectives: One of the main objectives of microarray analysis is to identify genes differentially expressed under two distinct experimental conditions. This task is complicated by the noisiness of data and the large number of genes that are examined. Fold change (FC) based gene selection often misleads because error variability for each gene is heterogeneous in different intensity ranges. Several statistical methods have been suggested, but some of them result in high false positive rates because they make very strong parametric assumptions. Methods: We present support vector quantile regression (SVMQR) using iterative reweighted least squares (IRWLS) procedure based on the Newton method instead of usual quadratic programming algorithms. This procedure makes it possible to derive the generalized approximate cross validation (GACV) method for choosing the parameters which affect the performance of SVMAR. We propose SVMQR based on a novel method for identifying differentially expressed genes with a small number of replicated microarrays. Results: We applied SVMQR to both three biological dataset and simulated dataset and showed that it performed more reliably and consistently than FC-based gene selection, Newtons method based on the posterior odds of change, or the nonparametric t-test variant implemented in significance analysis of microarrays (SAM). Conclusions: The SVMQR method was an exploratory method for cDNA microarray experiments to identify genes with different expression levels between two types of samples (e.g., tumor versus normal tissue). The SVMQR method performed well in the situation where error variability for each gene was heterogeneous in intensity ranges. cDNA microarray, support vector machine, support vector machine quantile regression Methods Inf Med 2008; 47: 459467 doi:10.3414/ME0396 Received: January 16, 2006; accepted: June 9, 2008
Summary
1. Introduction
The DNA microarray is a new tool in biotechnology. This tool allows the simultaneous monitoring of thousands of gene expressions in cells [1]. It has important applications in pharmaceutical and clinical research, including tumor classification, molecular pathway modeling, and functional genomics. One important and the most accepted use is the comparison of gene expression differences under two distinct experimental conditions (treated vs. untreated samples, diseased vs. normal tissue, mutant vs. wild-type organisms, etc.). In this kind of experimental setup, the main challenge is to determine which genes are differentially expressed across two tissue samples or samples obtained under two experimental conditions, i.e., to find the genes whose expression levels are deeply associated with the response of interest. In the early days, a simple fold change (FC) rule was applied to detect differentially expressed genes by using arbitrary cutoff point [2]. But, it has been known that simply using a FC rule is unreliable and inefficient [3]. Newton et al. [4] considered a hierarchical Gamma-Gamma-Bernoulli model but these assumptions seem to be too strong for routine data analysis use. They are clearly violated by the biological variation in a number of common experimental designs. Besides, the methods for identifying differentially expressed genes applicable to the microarray data, where error variability for each gene is heterogeneous over intensity ranges, have not been investigated. Therefore, we propose the support vector machine quantile regression (SVMQR) utilizing support
Keywords
vector machine (SVM), which perform well in microarray data with heterogeneous error variability depending on signal intensity. We introduce the nonparametric quantile regression method for identifying differentially expressed genes with a small number of replicated microarays. Quantile regression, which was first introduced by Koenker and Bassett [5], is a popular method for estimating the quantiles of a distribution conditional on the values of covariates. Similar to the classical linear regression methods minimizing the sum of squared residuals, quantile regression methods enable one to estimate a wide variety of models for conditional mean functions [6]. By contributing to the estimation of conditional mean functions with techniques for estimating an entire family of conditional quantile functions, quantile regression can potentially give a more complete statistical analysis of the stochastic relationships among the random variables [6]. Originally, SVM was developed by Vapnik [7, 8] to solve classification problems but its application has been expanded to solve regression problems. Because it is based on the structural risk minimization (SRM) principle, which minimizes an upper bound on the expected risk unlike the traditional empirical risk minimization (ERM) minimizing the error on the training data, we believe that SVM regression will be a better performance method for prediction and estimation of regression functions than other neural networks [9] and multivariate adaptive regression splines (MARS) [10]. Detailed information about SVM regression can be found in Cristianini and ShaweTaylor [11], Gunn [12], Smola and Scholkopf [13], and Vapnik [7, 8].
Methods Inf Med 5/2008
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
Since quantile regression is in principle based on absolute deviation loss, to derive quantile regression using the idea of SVM, the procedures of the case = 0 in a standard SVM is adopted. Then the quantile regression problem by the formulation for SVM with xi = (1, xti) t can be expressed as
Fig. 1 Equation 5: Lagrange function
Minimize (3) for (0, 1), The regularization parameter C >0 determines the trade-off between the flatness of quantile function estimate and the amount up to which deviations larger than 0 are tolerated. By introducing slack variables , *, we can rewrite (3) by following optimization problem, Minimize for (0, 1), (4)
Fig. 2
In this article, we propose SVMQR using IRWLS procedure based on the Newton method instead of usual quadratic programming algorithms. We present a SVMQR method for identifying differentially expressed genes with a small number of replicated microarrays. We applied our SVMQR method to both three real datasets of cDNA microarrays and simulated dataset. We compared the performance of our method with that of the fold change (FC) rule, of Newtons method and of the significance analysis of microarrays (SAM) method [14].
tile regression model the quantile function of the response yi for a given xi is assumed to be nonlinearly related to the input vector xi Rd. To allow for the nonlinear quantile regression, the input vectors xi are nonlinearly transformed into a potentially higher-dimensional feature space F by a nonlinear mapping function (.). The quantile function of the response yi for a given xi can be given as Q( xi) = wt (xi) for (0, 1), (1)
subject to
where w is the th regression quantile. Here, similar to SVM for nonlinear regression, the nonlinear regression quantile estimator cannot be given in an explicit form since we use the kernel function of input vectors instead of the dot product of their feature mapping functions except for the identity feature mapping function such that (x) = x. Its estimator is defined as any solution to the optimization problem [6], (2) for (0, 1), where is the check function defined as [r) = r(r 0) + ( 1)rI(r <0), here I(.) is the indicator function, that is, I(true) = 1, I( false) = 0.
where is upper training error, * is lower training error, and C >0 is the regularization parameter. The parameter C determines the trade-off between the flatness of f and the amount up to which deviations larger than 0 are tolerated. Equation 4 corresponds to dealing with an absolute deviation loss function. Since the symmetry of the absolute value yields the median, simply giving different weights to positive and negative residuals would yield the quantiles by minimizing a sum of asymmetrically weighted absolute residuals. This is indeed the case of finding quantile regression. Solving (4) under the constraints yields the 0th sample quantile as its solution. The second term of Equation 4 is, in fact, the tilted absolute value function. The Lagrange function is constructed as can be seen in Figure 1.
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
Notice that the positivity constraints i , i*, i , i* 0 should be satisfied. After taking partial derivatives of Equation 5 with regard to the primal variables (w, i , i*) and plugging them into Equation 5, the dual optimization problem with kernel function K(,) is obtained as can be seen in Figure 2, subject to i [0, C] and i* [0, (1 )C]. By substituting i = i i* it is possible to rewrite the above dual problem as follows: (7) subject to i [(1 )C, C ]. Solving the dual optimization problem with the constraints determines the optimal Lagrange multipliers, i , the 0th regression quantile estimators and the 0th quantile function predictors of the input vector x are obtained, respectively, as follows: and (8) Here, w and Q( | x) depend implicitly on through i depending on . We use a Gaussian kernel function, which is most commonly used and is defined as , where is kernel parameter. The kernel parameter will be determined by the generalized approximate cross validation (GACV). We added this explanation on kernel function.
Nychka et al. [16] suggested employing the modified check function , instead, which differs from only in the region (, ) where
by the penalty constant C and the kernel parameter. To choose the parameters of SVMQR we first need to consider the cross validation (CV) function as follows: , (13)
(9)
By setting small enough, we can get a good approximate solution to (3). Substituting (8) and (9) to (3) yields the problem obtains through minimizing
(10)
where Ki is the ith row of the kernel matrix K. Taking partial derivatives of (10) with regard to leads to the optimal values of to be the solution to 0 = K + CKWy + CKWK . (11)
where is the set of parameters and Q (i)( | x) is the quintile function estimated without ith observation. But the computational cost associated with CV function is formidable since for each candidate set of parameters, Q (i)( | x) for i = 1, ..., n should be evaluated. Thus we adopt GACV derived by Muan [17] as a remedy to CV. Muan [17] proposed GACV for the selection of smoothing parameter for the quantile smoothing spline estimates,
(14)
HereW is a diagonal matrix with the ith diagonal element obtained from the derivative of the modified check function as
(12)
where H is the hat matrix such that Q( | x) = Hy with the (i, j)th element Q( | xi) = yi . By the way, this GACV function cannot be applied to SVMQR using QP since the hat H is not computable. But it can be applied to SVMQR using IRWLS since H can be obtained from (8) and (11) as follows: H = K(K/C + KWK) 1 KW . Thus our proposed GACV is given as (15)
where ri = yi Ki . The solution to (11) cannot be obtained in a single step since W contains therein. Thus we need to apply IRWLS procedure which starts with initialized values of as follows: 1) Calculate W with . 2) Calculate from = (/C + KWK)1KWy. 3) Iterate steps until convergence. The problem of choosing the smoothing parameters is ubiquitous in function estimation. Thus we now illustrate the model selection method which chooses the appropriate parameters of SVMQR. The functional structure of SVMQR is characterized
(16)
where is the set of the penalty constant and the kernel parameter and H is the hat matrix in (15).
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
pression in microarray data. For a gene (spot), let R and G denote the measured fluorescence intensities for the red and green dyes, respectively. The gene expression data consist of log-intensity ratios Mij (= log2 Rij /Gij), where i = 1, 2, , p (genes), j = 1, 2, , n (samples). Denote the mean and the standard deviation of Mij for gene as Mi , si , respectively.
genes were excluded by the following criteria: 1) The PCR amplification of the sequence spotted on the array was deemed acceptable only if the amplification was confirmed and a single size product was obtained. 2)Accurate printing of each spot was required, as shown by an emission signal from more than 40% of the spot area. 3) The signal from the fluorophore labels had to be higher than 28. The datasets were further processed by print-tip-dependent normalization and dye-swap normalization. The k-Nearest Neighbor (KNN) method was used to fill in the missing values of the datasets. The final output datasets were composed of 6340 genes. Sohn et al. [15] used microarray dataset of a diet-induced obese.
2.3 Datasets
2.3.1 Microarray Dataset of a Diet-induced Obese (DIO) Mouse Model
The experimental group consisted of six mice whose diet was a high-fat diet (HFD) for 12 weeks.The control group consisted of age/weight-matched six mice whose diet was a low-fat diet (LFD) for 12 weeks. Equal amounts of RNA from six mice of each group were pooled. Each sample was equally divided. One half was used to generate Cy3-labeled cDNA. The other half was used to generate Cy5-labeled cDNA for dye swapping. Six replicates of hybridization were performed. Three of these were repeated with the fluorophores reversed to prevent dye-bias. The Cy5 and Cy3 probes were mixed and hybridized to a microarray containing 10,336 cDNA probes. Probes were spotted onto glass slides using a 4 8 print head. Two fluorescent images (Cy3 and Cy5) were scanned separately by using a GMS 418 Array Scanner (Affymetrix, Santa Clara, CA, USA). Signal intensity values were obtained from the ImaGene 4.2 (Biodiscovery, Santa Monica, CA, USA) and the MAAS (Gaiagene, Seoul, Korea) software applications. At first, 3996
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
3. Results
In this section, we applied our SVMQR method to both three real datasets of mouse cDNA microarrays and simulated dataset. We compared the performance of our method with that of the fold change (FC) rule, of Newtons method and of the significance analysis of microarrays (SAM) method [14].
Fig. 3 A MA plot comparing HFD vs. LFD groups. M represents the log ratio of the two fluorescent dyes used to label probes. A represents the averaged logarithmic intensity. The SVMQR curves represent = 0.025 and = 0.975, respectively. The log posterior odds of change of 1:1, 10:1, and 100:1 are indicated as 0, 1, and 2, respectively.
Figure 3, where the log-ratios are given by M = log2 (R/G) and average log-intensity by A = log2R/G. Figure 3 shows the upper and lower quantiles of the SVMQR method, twofold change, and contours for Newtons method. The upper and lower curves stand for = 0.975 and = 0.025, respectively. The log posterior odds of change of 1:1, 10:1, and 100 :1 of Newtons method are indicated as 0, 1, and 2, respectively. This MA plot shows a tendency of increasing dispersion of the log-ratio M as the spot intensity, A, decreases. The SVMQR lines have narrower spacing in the lower ranges of intensity, but have wider spacing in the higher ranges of intensity. The conditional distribution of the log-ratio M may be asymmetric and heteroscedastic. The number of significant genes from at least one of three slides by three different methods and the number of repeated detections are shown in Table 1. It can be seen that
the FC and the SVMQR detect about the same number of differentially expressed spots when the upper and lower quartiles for the data were = 0.975 and = 0.025, respectively, and the fold change cutoffs were twofold. According to the repeat recovery rate, i.e., the percentage of spots that are also identified as differentially regulated in their corresponding ones in the second and third slides, the performance of the SVMQR was slightly better; that is, it detected differentially expressed genes more consistently in the three repeated slides. In FC, this rate was 8.2 %. This rate means that about 8% of the detected genes were found simultaneously in the three repeated slides. The corresponding rate in the SVMQR is 12.6%.The Newton method identified a few significant genes with three replicates. The Newton method was not able to identify many of the differentially expressed genes that were detected by the FC or SVMQR
Methods Inf Med 5/2008
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
Table 1 The number of significant genes from at least one of three slides using three different methods and the number of repeated detections in the analysis The diet-induced obese mouse model Method SVMQR FC Newton Method SVMQR FC Newton Cut-off threshold >0.975 or <0.025 fold change > 2 odd values > 0 Cutoff threshold The number of significant genes from at least one of three slides 759 1286 145 The number of significant genes from at least one of three slides 984 263 Repeated detection in three slides (the repeat recovery rate) 96 (12.6%) 106 (8.2%) 12 (8.2%) Repeated detection in three slides (the repeat recovery rate) 98 ( 12.7%) 90 ( 9.1%) 25 (9.0%)
glucose phosphate isomerase 1 complex, were also found differentially expressed in the DIO mouse model by the SVMQR method, but were missed by the FC.
method. Figure 4 is the Venn diagram showing the number of the genes identified as differentially regulated by the three methods in the DIO mouse model. The number of significant genes selected by the SVMQR and FC methods was 96 and 106, respectively. Sixty-nine genes were commonly selected to be significant by both methods. The lists of significant genes selected by the SVMQR or FC method are presented in Table 2. Next, we assessed the quality of the results using previously established biological knowledge. According to their biological
function, several interesting and important genes were identified by our SVMQR method. Cytochrome P450, family 4, subfamily a, polypeptide 14 (Mm.250901) is a good example. Previous studies showed that cytochrome P450, family 4, subfamily a, polypeptide 14 (Mm.250901) is likely to be functionally relevant for a DIO mouse model [21, 22]. This gene was found differentially regulated in the DIO mouse model only by the SVMQR method. The genes involved in metabolism, such as glycerol3-phosphate acyltransferase, mitochondrial, lactate dehydrogenase 1, A chain,
Fig. 4 A comparison among three methods using the microarray dataset of the diet-induced obese mouse model. A Venn diagram shows the number of genes identified by each experimental method when using a cutoff of twofold for the FC method, = 0.975 and = 0.025 for the SVMQR method, and posterior odd values >0 for the Newton method.
Fig. 5 A comparison among three methods using the microarray dataset of E. coli model. A Venn diagram shows the number of genes identified by each experimental method when using a cutoff of twofold for the FC method, = 0.975 and = 0.025 for the SVMQR method, and posterior odd values >0 for the Newton method.
Fig. 6 A comparison among the three methods using the microarray dataset of the HDL-deficient mouse model. A Venn diagram shows the numbers of genes identified by each experimental method when using a cutoff of twofold for the FC method, = 0.975 and = 0.025 for the SVMQR method, and selecting top 22 significant genes for the SAM. The list of genes identified by each experimental method is presented in Table 3.
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
Table 2 The number of significant genes from at least one of two slides using three different methods and the number of repeated detections in the analysis on E. coli data The E. coli model Method SVMQR FC Newton Cut-off threshold The number of significant genes from at least one of two slides 460 18 Repeated detection in two slides (the repeat recovery rate) 117 (33.3%) 96 (20.7%) 1 (5.5%)
1/3000. Red (R) and green channel (G) intensities for each gene simulate from a normal distribution with mean of the true expression signal and a standard deviation 15% of mean of the true expression signal. 2) We select 10% of the genes to be either over- or under-expressed. The selected genes have a targeted expression ratio that was generated by t = 10 b
Table 3 A comparison of the differentially expressed genes in a HDL-deficient mouse model identified by the support vector machine quantile regression, fold change, and SAM methods. The abbreviations used are: support vector machine quantile regression (Q), fold change (FC), and significance analysis of microarrays (SAM) methods. * The genes were confirmed by biological methods. ID 540 2149 2537 4139 1496 5356 4941 1337 2932 2989 4390 4533 4942 5188 5249 5731 6050 6117 6134 NAME EST, Highly similar to APOLIPOPROTEIN A-I PRECURSOR [Mus musculus], lipid-UG* Apo AI, lipid-Img* ESTs, Highly similar to APOLIPOPROTEIN C-III PRECURSOR [Mus musculus], lipid-UG* EST, Weakly similar to C-5 STEROL DESATURASE [S. cerevisiae], lipid-UG* EST* CATECHOL O-METHYLTRANSFERASE, MEMBRANE-BOUND FORM, Brain-Img* EST, Similar to yeast sterol desaturase, lipid-Img* Psoriasis-associated fatty acid binding protein, lipid-Img Mus musculus long chain fatty acyl CoA synthetase mRNA, complete cds, lipid-UG MDB0368 Cy3RT EST, Highly similar to CALCINEURIN B SUBUNIT [Drosophila melanogaster], heart-UG ORPHAN NUCLEAR RECEPTOR OF STEROID/THYROID SUPERFAMILY, Brain-Img Cy3RT Q < < < < < < < < < < < < < < < < < < FC < < < < < < < < < < < SAM
5' similar to SW:ACT_VOLCA P20904 ACTIN. ;. gi|2186634|gb|AA461743|AA461743 < [2186634] Mus musculus paraoxonase-3 (Pon3) mRNA, complete cds, lipid-UG 5'. gi|2186640|gb|AA461749|AA461749 [2186640] EST, Highly similar to CATECHOL O-METHYLTRANSFERASE, MEMBRANE-BOUND FORM [R. norvegicus], Brain-UG Mouse MAPK mRNA for mitogen-activated protein kinase (p42), heart-UG < < < < < <
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
Fig. 7 Two patterns for the simulation study. Left panels show MA plot before Lowess normalization, and right panels show MA plot after Lowess normalization. denotes differently expressed genes.
ated external validation using the simulated data. We considered two models. In the first model (Model 1 in Table 4), a training dataset and a second dataset are a sinusoid shape and a banana shape, respectively. In the second model (Model 2 in Table 4), a training dataset and a second dataset are a banana shape and a sinusoid shape, respectively. We generated 500 genes and Lowess normalization [29] was done. The training dataset is used to fit the models and the second dataset is used to estimate the true predictive performance [30]. This procedure was repeated 100 times. We compared both average sensitivity and specificity from the simulated data. We selected C and parameter values using GACV function for = 0.975 and = 0.025, respectively. For Newtons method, we selected genes whose posterior odd values were higher than 0. Table 4 shows the average number of genes selected, the average number of true genes selected, average sensitivity, and average specificity from the simulated data. As shown in Table 4, SVMQR method gives a higher average sensitivity but a little lower average specificity than Newtons method. Although SVMQR method gives a little lower average specificity than Newtons method, Newtons method missed many true genes.
Model 1 SVMQR Average number of genes selected Average number of true genes selected Average sensitivity Average specificity 50.76 22.96 0.45 0.94 Newton 18.76 16.48 0.32 0.99
Model 2 SVMQR 46.64 23.8 0.47 0.94 Newton 18.72 17.58 0.35 0.99
Table 4 The average number of genes selected, the average number of true genes selected, average sensitivity, and average specificity from the simulated data
4. Discussion
In this paper, we proposed support vector quantile regression (SVMQR) using iterative reweighted least squares (IRWLS) procedure based on the Newton method and new SVMQR method for identifying differentially expressed genes with a small number of replicated microarrays. In microarray studies, gene selection based on foldchange (FC) values is often misleading especially when the error variability for each gene is heterogeneous over the intensity ranges. The FC values calculated from the measured intensity levels may give a different interpretation for a gene whose absolute expression level is low. The old methods, such as by Chen et al.[3] and Newton et al.[4], are based on the assumed parametric models (e.g. Gamma or Gaussian) for the (R, G) intensities, but these assumptions
where b satisfies a beta distribution, b B(1.7, 4.8). R and G intensities of these genes then are converted by R = R t and G = G/t. 3) In order to transform these intensities to nonlinear patterns, Rand Gintensities of all genes are converted by
and
Methods Inf Med 5/2008
(17)
The two patterns were considered. The first pattern is a sinusoid shape with parameters 0 1 2 3 (a1 = 0, a1 = 100 1/0.9, a1 = 0.9, a1 = 1) and 0 = 0, a1 = 100 1/0.7, a2 = 0.7, a3 = 1) in (a1 1 1 1 Equation 17 (see Fig. 7a). The second pattern is a banana shape with parameters 0 1 2 3 (a1 = 0, a1 = 10, a1 = 1, a1 = 1) and 0 1 2 3 (a1 = 0, a1 = 500, a1 = 1, a1 = 1) (see Fig. 7c). To investigate relative performances of SVMQR and Newtons methods, we evalu-
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.
seem to be too strong for routine data analysis use. However, our SVMQR method deals with the estimation of the th quantile of the log-ratios (M = log2 (R/G)) given the average log-intensity (A = log2 RG). Therefore, if we use the information on the quantile of the log-ratios (M = log2 (R/G)) for identifying differentially expressed genes, for data with heteroscedasticity (Fig. 3), SVMQR method performs much better than the fold change which uses only absolute log-ratios (M = log2 (R/G) and does not need the parametric assumptions. The SVMQR method was an exploratory method for cDNA microarray experiments to identify genes with different expression levels between two types of samples (e.g., tumor versus normal tissue). The SVMQR method performed well in the situation where error variability for each gene was heterogeneous in intensity ranges.
Acknowledgements This work was supported by a Korea Science and Engineering Foundation Grant (R14-2003-002-01002-0).
References
1. Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. The chipping forecast 1999; 21: 33-37. 2. DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278: 680-686. 3. Chen Y, Dougherty ER, Bittner ML. Ratio-based decisions and the quantitative analysis of cDNA microarray image. Biomedical Optics 1997; 2: 364-374. 4. Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J of Com Bio 2001; 8: 37-52. 5. Koenker R, Bassett G. Regression Quantiles. Econometrica 1978; 46: 33-50. 6. Koenker R, Xiao Z. Inference on the quantile regression process. Econometrica; 70 (4): 1583-1612.
7. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995. 8. Vapnik VN. Statistical Learning Theory. New York: Springer; 1998. 9. Gunn SR, Brown M, Bossley KM. Network performance assessment for neurofuzzy data modelling. Lecture Notes in Computer Science 1997; 208: 313-323. 10. Ripley BD. Neural networks and related methods for classification. Journal of Royal Statistical Society 1994; 56: 409-456. 11. Cristianini N, Shawe-Taylor J. Support Vector Regression. Cambridge University Press; 2000 12. Gunn S. Support Vector Machines for Classification and Regression. ISIS Technical Report, University of Southampton; 1998. 13. Smola A, Scholkopf B. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation and Operator Inversion. Algorithmica 1998; 22: 211-231. 14. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 2001; 98: 5116-5121. 15. Sohn I, Kim S, Hwang C, Lee JW. New normalization methods using support vector machine quantile regression approach in microarray analysis. Computational Statistics and Data Analysis. In press. 16. Nychka D, Gray G, Haaland P, Martin D, OConnell M. A Nonparametric Regression Approach to Syringe Grading for Quality Improvement. Journal of the American Statistical Association 1995; 90: 1171-1178. 17. Muan M. GACV for quantile smoothing splines, Computational Statistics and Data Analysis 2006; 50 (2006): 813-829 18. Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res 1999; 27 (19): 3821-3835. 19. Dudoit S, Yang YH, Speed TP, Callow MJ. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 2002; 12: 111-140. 20. Kim S, Sohn I, Ahn J-I, Lee K-H, Lee Y-S, Lee Y-S. Hepatic gene expression profile in long-term high-fat diet-induced obesity mouse model. Gene 2004; 340: 99-109. 21. Becker W, Kluge R, Kantner T, Linnartz K, Korn M, Tschank G, Plum L, Giesen K, Joost HG. Differential hepatic gene expression in a polygenic mouse model with insulin resistance and hyperglycemia: evidence for a combined transcriptional
dysregulation of gluconeogenesis and fatty acid synthesis. J Mol Endocrinol 2004; 32: 195-208. 22. Enriquez A, Leclercq I, Farrell GC, Robertson G. Altered expression of hepatic CYP2E1 and CYP4A in obese, diabetic ob/ob mice, and fa/fa Zucker rats. Biochem Biophys Res Commun 1995; 255: 300-306. 23. Callow MJ, Dudoit S, Gong EL, Speed TP, Rubin EM. Microarray Expression Profiling Identifies Genes with Altered Expression in HDL-Deficient Mice. Genome Reserach 2000; 10: 2022-2029. 24. Memon RA, Fuller J, Moser AH, Smith PJ, Grunfeld C, Feingold KR. Regulation of putative fatty acid transporters and Acyl-CoA synthetase in liver and adipose tissue in ob/ob mice. Diabetes 1999; 48: 121-127. 25. Malewiak MI, Griglio S, Le Liepvre X. Relationship between lipogenesis, ketogenesis, and malonyl-CoA content in isolated hepatocytes from the obese Zucker rat adapted to a high-fat diet. Metabolism 1985; 34: 604-611. 26. Balagurunathan Y, Dougherty E, Chen Y, Bittner M, Trent J. Simulation of cdna microarrays via a parameterized random signal model. Journal of Biomedical Optics 2002; 7: 507-523. 27. Fujita A, Sato JR, de Oliverira Rodrigues L, Ferrerira CE, Sogayar MC. Evaluating different methods of microarray data normalization. BMC Bioinformatics 2006; 7: 469. 28. Haldermans P, Shkedy Z, Sanden SV, Burzykowski T, Aerts M. Using Linear Mixed Models for Normalization of cDNA Microarrays. Statistical Applications in Genetics and Molecular Biology 2007; 6 (1): 1-23. 29. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002; 30 (4): e15. 30. Konig IR, Malley JD, Weimar C, Diener H-C, Ziergler A. Practical experiences on the necessity of external validation. Statist Med 2007. In press.
Correspondence to: Sujong Kim Skin Research Institute AmorePacific R&D Center 314-1 Sanggal-dong Kiheung-gu, Yongin-si Kyounggi-do 449-729 Korea E-mail: sundance@amorepacific.com
Downloaded from www.methods-online.com on 2011-12-17 | IP: 129.215.5.255 For personal or educational use only. No other uses without permission. All rights reserved.