Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

A model-based relevance estimation
approach for feature selection in
microarray datasets
Gianluca Bontempi, Patrick E. Meyer
{gbonte,pmeyer}@ulb.ac.be
Machine Learning Group,
Computer Science Department
ULB, Université Libre de Bruxelles
Boulevard de Triomphe - CP 212
Bruxelles, Belgium
http://www.ulb.ac.be/di/mlg
A model-based relevance estimation approach for feature selection in microarray datasets – p. 1/1

2

Outline
• Feature selection in microarray classification tasks
• Definition of relevance
• Relevance and feature selection
• Our approach to relevance estimation: between filter and wrapper
• Experimental results
A model-based relevance estimation approach for feature selection in microarray datasets – p. 2/1

3

Feature selection in microarray
• The availability of massive amounts of experimental data based on
genome-wide studies has given impetus in recent years to a large effort
in developing mathematical, statistical and computational techniques to
infer biological models from data.
• In many bioinformatics problems, the number of features is significantly
larger than the number of samples (high feature-to-sample ratio
datasets).
• This is typical of cancer classification tasks where a systematic
investigation of the correlation of expression patterns of thousands of
genes to specific phenotypic variations is expected to provide an
improved taxonomy of cancer.
• In this context, the number of features n corresponds to the number of
expressed gene probes (up to several thousands) and the number of
observations N to the number of tumor samples (typically in the order of
hundreds).
• Feature selection and consequently gene selection is required to
perform classification in such an high dimensional task.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 3/1

4

State-of-the-art
Feature selection requires an accurate assessment of a large number of
alternative subsets in terms of predictive power or relevance to the output
class.
Three main state-of-the-art approaches are
Filters: these are preprocessing methods which assess the merits of features
from the data without having recourse to any learning algorithm.
Examples: ranking, PCA, t-test.
Wrappers: these methods rely on a learning algorithm to assess and compare
subsets of variables. They conduct a search for a good subset using the
learning algorithm itself as part of the evaluation function. Examples are
the forward/backward methods proposed in classical regression
analysis.
Embedded methods: they perform variable selection as part of the learning
procedure and are usually specific to given learning machines.
Examples are classification trees and methods based on regularization
techniques (e.g. lasso)
A model-based relevance estimation approach for feature selection in microarray datasets – p. 4/1

5

Between filters and wrappers
• Filter approaches rely on learner independent estimators to assess the
relevance of a set of features. The rationale of filter techniques is that
the importance of a set of feature should be independent of the
prediction technique.
Our contribution: we propose a model-based strategy to assess
the relevance of a set of features.
• Wrapper depends on a specific learner to assess a set of features and
end up with returning a quantity which confounds the relevance of a
subset (desired quantity) with the quality of the learner (not required). In
other terms wrapper returns a biased estimation of the relevance of a
subset.
Our contribution: since wrapper bias may have a strong negative
impact on the selection procedure we propose a model-based
technique for relevance assessment which is low-biased.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 5/1

6

Feature selection and relevance
• Let us consider a binary classification problem where x ∈ X ⊂ Rn
and
y ∈ Y = {y0, y1}. Let s ⊆ x, s ∈ S be a subset of the input vector.
• Let us denote
p1(s) = Prob {y = y1|s}
p0(s) = Prob {y = y0|s}
• A feature selection problems can be formalized as a problem of (learner
independent) relevance maximization
s∗
= arg max
s⊆x,|s|≤d
Rs
where the goal is to find the subset s that maximizes the relevance
quantity Rs which accounts for the predictive power that the input s has
on the target y.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 6/1

7

Relevance definitions
• A well known example of relevance measure is mutual information
I(s, y) = H(y) − H(y|s).
• Here we will focus on the quantity
Rs =
S
p2
0(s) + p2
1(s) dFs(s) =
S
r(s)dFs(s)
where r(s) = 1 − g(s) and g(s) is Gini index of diversity.
• Note that
r(s) = p2
0(s) + p2
1(s) = 1 − 2p0(s)(1 − p0(s)) = 1 − 2Var {y|s}
where Var {y|s} is the conditional variance of y.
• Also a monotone function
GH (·) : [0, 1] → [0, 0.5]
maps the entropy H(y|s) of a binary variable y to the related Gini index
g. A model-based relevance estimation approach for feature selection in microarray datasets – p. 7/1

8

Bias of the wrapper approach
Given a learner h trained on dataset of size N, the wrapper approach
translates the (learner independent) relevance maximization problem into a
(learner dependent) minimization problem
arg min
s⊆x,|s|≤d
Mh
s = arg min
s⊆x,|s|≤d S
MME
h
(s)dFs(s)
where the Mean Misclassification Error is decomposed as follows (Wolpert,
Kohavi, 96)
MME
h
(s) =
1
2
1 − (p2
0(s) + p2
1(s)) +
+
1
2
(p0(s) − pˆ0(s))2
+ (p1(s) − pˆ1(s))2
+
+
1
2
1 − (p2
ˆ0
(s) + p2
ˆ1
(s)) =
1
2
(n(s) + b(s) + v(s))
where pˆ0 = Prob {ˆy = y0|s}, n(s) = 1 − r(s) is the noise variance term, b(s) is
the learner squared bias and v(s) is the learner variance.
NB: the term b(s) is NOT dependent on relevance.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 8/1

9

Bias of the wrapper approach
• In real classification tasks, the one-zero misclassification error Mh
s of a
learner h for a subset s cannot be derived analytically but only estimated
(typicallly by cross-validation).
• A wrapper selection returns
sh
= arg min
s⊂x,|s|≤d
M
h
s (1)
where M
h
s the estimate of the misclassification error of the learner h (e.g.
computed by cross-validation)
If a wrapper strategy relies on a generic learner h, that is a learner
where the bias term b(s) is significantly different from zero, the
returned feature selection will depend on a quantity which is a
biased estimate of the term r(s) and consequently of the relevance
Rs. In other words, wrappers do not maximize relevance.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 9/1

10

Unbiased wrapper approach
• Intuitively, the bias would be reduced if we adopted a learner having a
small bias term. A low bias, yet high variance, learner is the k-nearest
neighbour classifiers (kNN) for small values of k
• In particular, it has been shown that for a 1NN learner and a binary
classification problem
lim
N→∞
M1NN
s = 1 − Rs
where M1NN
s is the misclassification error of a nearest neighbour.
• Since cross-validation returns a consistent estimation of Mh
and since
the quantity Mh
asymptotically converges to one minus the relevance
Rs, we have that 1 − M
1NN
s is a consistent estimator of the relevance Rs.
• We propose then as relevance estimator
ˆRkNN
s = 1 − M
kNN
s
which is the cross-validation error of a kNN learner with low k. This term
returns an unbiased, yet high variance, estimate of the relevance of the
subset s. A model-based relevance estimation approach for feature selection in microarray datasets – p. 10/1

11

Reducing the variance of the estimator
The low-bias high-variance nature of the ˆRkNN
s estimator suggests that the
best way to employ this estimator is by combining it with other relevance
estimators.
We will take into consideration two possible estimators to combine with:
1. a direct model-based estimator ˆp1 of the conditional probability
p1(s) = Prob {y = y1|s} and consequently of the quantity r(s).
This estimator first samples a set of N unclassified input vectors si
according to the empirical distribution ˆFs and then computes the Monte
Carlo estimation
ˆRD
s =
1
N
N
i=1
ˆp2
1(si ) + ˆp2
0(si ) = 1 −
2
N
N
i=1
ˆp1(si )(1 − ˆp1(si ))
A similar estimator was proposed by Fukunaga in 1973 to estimate the
Bayes error.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 11/1

12

2. a filter estimator based on the notion of mutual information: several filter
algorithms exploit this notion in order to estimate the relevance. An
example is the MRMR algorithm (Peng et al., 05) where the relevance of
a feature subset s, expressed in terms of the mutual information
I(s, y) = H(y) − H(y|s), is approximated by the incremental formulation
IMRMR(s; y) = IMRMR(si; y) + I(xi, y) −
1
m − 1 xj ∈si
I(xj; xi) (2)
where xi is a feature belonging to the subset s, si is the set s with the xi
feature set aside and m is the number of components of s. Now since
H(y|s) = H(y) − I(s, y) and Gs = 1 − Rs = GH (H(y|s)) we obtain that
ˆRMRMR
s = 1 − GH (H(y) − IMRMR(s, y))
is a MRMR estimator of the relevance Rs where GH (·) is the monotone
mapping between H and Gini index.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 12/1

13

Proposed relevance estimators
We propose two novel relevance estimators based on the principle of
averaging
ˆRs =
ˆRCV
s + ˆRD
s
2
,
ˆRs =
ˆRCV
s + ˆRMRMR
s
2
and the associated feature selection algorithms:
sR
= arg max
s⊂x,|s|≤d
ˆRs
sR
= arg max
s⊂x,|s|≤d
ˆRs
A model-based relevance estimation approach for feature selection in microarray datasets – p. 13/1

14

Experimental session
• 20 public domain microarray expression datasets
• external cross-validation scheme three-fold cross-validation strategy
• to avoid any dependency between the learning algorithm employed by
the wrapper and the classifier used for prediction, the experimental
session is composed of two parts:
• Part 1: comparison with the wrapper WSVM and we use the set of
classifiers C1 ={TREE, NB, SVMSIGM, LDA, LOG} which does not
include the SVMLIN learner,
• Part 2: comparison with the wrapper WNB and we use the set of
classifiers C2 ={TREE, SVMSIGM, SVMLIN, LDA, LOG} which does
not include the NB learner.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 14/1

15

Experiments with cancer datasets
Name N n K
Golub 72 7129 2
Alon 62 2000 2
Notterman 36 7457 2
Nutt 50 12625 2
Shipp 77 7129 2
Singh 102 12600 2
Sorlie 76 7937 2
Wang 286 22283 2
Van’t Veer 65 24481 2
VandeVijver 295 24496 2
Sotiriou 99 7650 2
Pomeroy 60 7129 2
Khan 63 2308 4
Hedenfalk 22 3226 3
West 49 7129 4
Staunton 60 7129 9
Su 174 12533 11
Bhattacharjee 203 12600 5
Armstrong 72 12582 3
Ma 60 22575 3
A model-based relevance estimation approach for feature selection in microarray datasets – p. 15/1

16

Results 1st part
Name R’ WSVM R” MRMR RANK
Golub 0.0917 0.1177 0.1 0.1079 0.1225
Alon 0.2704 0.2658 0.2267 0.1996 0.2281
Notterman 0.1966 0.0985 0.1494 0.1472 0.1432
Nutt 0.3798 0.4171 0.3873 0.3847 0.4189
Shipp 0.1429 0.1319 0.1322 0.1362 0.1873
Singh 0.1619 0.1517 0.1266 0.1374 0.1328
Sorlie 0.3835 0.4314 0.3963 0.4004 0.3987
Wang 0.4282 0.4111 0.4218 0.4232 0.4181
Van’t Veer 0.2786 0.2638 0.2492 0.2217 0.2277
VandeVijver 0.454 0.4724 0.4365 0.4636 0.4482
Sotiriou 0.5279 0.5796 0.5351 0.5708 0.5339
Pomeroy 0.428 0.4191 0.4141 0.3876 0.4181
Khan 0.0878 0.1143 0.0582 0.0686 0.131
Hedenfalk 0.5475 0.5263 0.452 0.5273 0.5389
West 0.6463 0.6109 0.6186 0.5746 0.6109
Staunton 0.6822 0.71 0.6511 0.6865 0.7407
Su 0.2568 0.307 0.2549 0.3772 0.3352
Bhattacharjee 0.1232 0.1347 0.1105 0.1057 0.1515
Armstrong 0.1082 0.1199 0.1306 0.115 0.1122
Ma 0.2456 0.2041 0.2257 0.2413 0.2317
AVG 0.323 0.331 0.310 0.326 0.331
W/B than R’ (R”) 10/7 9/6 9/2
A model-based relevance estimation approach for feature selection in microarray datasets – p. 16/1

17

Results 2nd part
Name R’ WNB R” MRMR RANK
Golub 0.0886 0.1114 0.0971 0.1019 0.0904
Alon 0.2376 0.2568 0.2181 0.2109 0.221
Notterman 0.1852 0.2059 0.1491 0.1512 0.1645
Nutt 0.3929 0.3402 0.36 0.3898 0.4258
Shipp 0.1261 0.127 0.1198 0.1338 0.1734
Singh 0.1495 0.1454 0.1297 0.1377 0.1245
Sorlie 0.3848 0.4254 0.3808 0.3953 0.3838
Wang 0.4363 0.4345 0.4298 0.4281 0.4255
Van’t Veer 0.2747 0.2715 0.2421 0.2253 0.2325
VandeVijver 0.4626 0.44 0.4763 0.4721 0.4358
Sotiriou 0.5126 0.5578 0.5505 0.5732 0.5611
Pomeroy 0.4367 0.4389 0.4007 0.3902 0.4224
Khan 0.0804 0.0896 0.0628 0.0631 0.0901
Hedenfalk 0.5379 0.5187 0.4369 0.4904 0.4949
West 0.6413 0.6696 0.5542 0.5882 0.6728
Staunton 0.6689 0.8298 0.6981 0.6661 0.83
Su 0.2544 0.3096 0.2646 0.3739 0.3529
Bhattacharjee 0.1235 0.1209 0.101 0.1061 0.1186
Armstrong 0.1079 0.1668 0.125 0.1148 0.1034
Ma 0.2565 0.2635 0.2335 0.2443 0.2681
AVG 0.322 0.3335 0.315 0.327 0.331
W/B than R’ (R”) 9/2 10/3 11/2
A model-based relevance estimation approach for feature selection in microarray datasets – p. 17/1

18

Conclusions
• Feature selection demands accurate estimation of relevance of subsets
of features.
• Wrapper methods use cross-validation estimation of misclassification
error with generic learners. We show that this means a biased
estimation of relevance.
• The cross validation assessment ˆRkNN
s returned by kNN techniques
with low k provide a low bias yet high variance estimator of relevance.
• Variance can be reduced by combining with other estimators.
• Experiments on real datasets showed that the resulting relevance
estimator can outperform both conventional wrapper and filter
algorithms.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 18/1

More Related Content

A model-based relevance estimation approach for feature selection in microarray datasets

  • 1. A model-based relevance estimation approach for feature selection in microarray datasets Gianluca Bontempi, Patrick E. Meyer {gbonte,pmeyer}@ulb.ac.be Machine Learning Group, Computer Science Department ULB, Université Libre de Bruxelles Boulevard de Triomphe - CP 212 Bruxelles, Belgium http://www.ulb.ac.be/di/mlg A model-based relevance estimation approach for feature selection in microarray datasets – p. 1/1
  • 2. Outline • Feature selection in microarray classification tasks • Definition of relevance • Relevance and feature selection • Our approach to relevance estimation: between filter and wrapper • Experimental results A model-based relevance estimation approach for feature selection in microarray datasets – p. 2/1
  • 3. Feature selection in microarray • The availability of massive amounts of experimental data based on genome-wide studies has given impetus in recent years to a large effort in developing mathematical, statistical and computational techniques to infer biological models from data. • In many bioinformatics problems, the number of features is significantly larger than the number of samples (high feature-to-sample ratio datasets). • This is typical of cancer classification tasks where a systematic investigation of the correlation of expression patterns of thousands of genes to specific phenotypic variations is expected to provide an improved taxonomy of cancer. • In this context, the number of features n corresponds to the number of expressed gene probes (up to several thousands) and the number of observations N to the number of tumor samples (typically in the order of hundreds). • Feature selection and consequently gene selection is required to perform classification in such an high dimensional task. A model-based relevance estimation approach for feature selection in microarray datasets – p. 3/1
  • 4. State-of-the-art Feature selection requires an accurate assessment of a large number of alternative subsets in terms of predictive power or relevance to the output class. Three main state-of-the-art approaches are Filters: these are preprocessing methods which assess the merits of features from the data without having recourse to any learning algorithm. Examples: ranking, PCA, t-test. Wrappers: these methods rely on a learning algorithm to assess and compare subsets of variables. They conduct a search for a good subset using the learning algorithm itself as part of the evaluation function. Examples are the forward/backward methods proposed in classical regression analysis. Embedded methods: they perform variable selection as part of the learning procedure and are usually specific to given learning machines. Examples are classification trees and methods based on regularization techniques (e.g. lasso) A model-based relevance estimation approach for feature selection in microarray datasets – p. 4/1
  • 5. Between filters and wrappers • Filter approaches rely on learner independent estimators to assess the relevance of a set of features. The rationale of filter techniques is that the importance of a set of feature should be independent of the prediction technique. Our contribution: we propose a model-based strategy to assess the relevance of a set of features. • Wrapper depends on a specific learner to assess a set of features and end up with returning a quantity which confounds the relevance of a subset (desired quantity) with the quality of the learner (not required). In other terms wrapper returns a biased estimation of the relevance of a subset. Our contribution: since wrapper bias may have a strong negative impact on the selection procedure we propose a model-based technique for relevance assessment which is low-biased. A model-based relevance estimation approach for feature selection in microarray datasets – p. 5/1
  • 6. Feature selection and relevance • Let us consider a binary classification problem where x ∈ X ⊂ Rn and y ∈ Y = {y0, y1}. Let s ⊆ x, s ∈ S be a subset of the input vector. • Let us denote p1(s) = Prob {y = y1|s} p0(s) = Prob {y = y0|s} • A feature selection problems can be formalized as a problem of (learner independent) relevance maximization s∗ = arg max s⊆x,|s|≤d Rs where the goal is to find the subset s that maximizes the relevance quantity Rs which accounts for the predictive power that the input s has on the target y. A model-based relevance estimation approach for feature selection in microarray datasets – p. 6/1
  • 7. Relevance definitions • A well known example of relevance measure is mutual information I(s, y) = H(y) − H(y|s). • Here we will focus on the quantity Rs = S p2 0(s) + p2 1(s) dFs(s) = S r(s)dFs(s) where r(s) = 1 − g(s) and g(s) is Gini index of diversity. • Note that r(s) = p2 0(s) + p2 1(s) = 1 − 2p0(s)(1 − p0(s)) = 1 − 2Var {y|s} where Var {y|s} is the conditional variance of y. • Also a monotone function GH (·) : [0, 1] → [0, 0.5] maps the entropy H(y|s) of a binary variable y to the related Gini index g. A model-based relevance estimation approach for feature selection in microarray datasets – p. 7/1
  • 8. Bias of the wrapper approach Given a learner h trained on dataset of size N, the wrapper approach translates the (learner independent) relevance maximization problem into a (learner dependent) minimization problem arg min s⊆x,|s|≤d Mh s = arg min s⊆x,|s|≤d S MME h (s)dFs(s) where the Mean Misclassification Error is decomposed as follows (Wolpert, Kohavi, 96) MME h (s) = 1 2 1 − (p2 0(s) + p2 1(s)) + + 1 2 (p0(s) − pˆ0(s))2 + (p1(s) − pˆ1(s))2 + + 1 2 1 − (p2 ˆ0 (s) + p2 ˆ1 (s)) = 1 2 (n(s) + b(s) + v(s)) where pˆ0 = Prob {ˆy = y0|s}, n(s) = 1 − r(s) is the noise variance term, b(s) is the learner squared bias and v(s) is the learner variance. NB: the term b(s) is NOT dependent on relevance. A model-based relevance estimation approach for feature selection in microarray datasets – p. 8/1
  • 9. Bias of the wrapper approach • In real classification tasks, the one-zero misclassification error Mh s of a learner h for a subset s cannot be derived analytically but only estimated (typicallly by cross-validation). • A wrapper selection returns sh = arg min s⊂x,|s|≤d M h s (1) where M h s the estimate of the misclassification error of the learner h (e.g. computed by cross-validation) If a wrapper strategy relies on a generic learner h, that is a learner where the bias term b(s) is significantly different from zero, the returned feature selection will depend on a quantity which is a biased estimate of the term r(s) and consequently of the relevance Rs. In other words, wrappers do not maximize relevance. A model-based relevance estimation approach for feature selection in microarray datasets – p. 9/1
  • 10. Unbiased wrapper approach • Intuitively, the bias would be reduced if we adopted a learner having a small bias term. A low bias, yet high variance, learner is the k-nearest neighbour classifiers (kNN) for small values of k • In particular, it has been shown that for a 1NN learner and a binary classification problem lim N→∞ M1NN s = 1 − Rs where M1NN s is the misclassification error of a nearest neighbour. • Since cross-validation returns a consistent estimation of Mh and since the quantity Mh asymptotically converges to one minus the relevance Rs, we have that 1 − M 1NN s is a consistent estimator of the relevance Rs. • We propose then as relevance estimator ˆRkNN s = 1 − M kNN s which is the cross-validation error of a kNN learner with low k. This term returns an unbiased, yet high variance, estimate of the relevance of the subset s. A model-based relevance estimation approach for feature selection in microarray datasets – p. 10/1
  • 11. Reducing the variance of the estimator The low-bias high-variance nature of the ˆRkNN s estimator suggests that the best way to employ this estimator is by combining it with other relevance estimators. We will take into consideration two possible estimators to combine with: 1. a direct model-based estimator ˆp1 of the conditional probability p1(s) = Prob {y = y1|s} and consequently of the quantity r(s). This estimator first samples a set of N unclassified input vectors si according to the empirical distribution ˆFs and then computes the Monte Carlo estimation ˆRD s = 1 N N i=1 ˆp2 1(si ) + ˆp2 0(si ) = 1 − 2 N N i=1 ˆp1(si )(1 − ˆp1(si )) A similar estimator was proposed by Fukunaga in 1973 to estimate the Bayes error. A model-based relevance estimation approach for feature selection in microarray datasets – p. 11/1
  • 12. 2. a filter estimator based on the notion of mutual information: several filter algorithms exploit this notion in order to estimate the relevance. An example is the MRMR algorithm (Peng et al., 05) where the relevance of a feature subset s, expressed in terms of the mutual information I(s, y) = H(y) − H(y|s), is approximated by the incremental formulation IMRMR(s; y) = IMRMR(si; y) + I(xi, y) − 1 m − 1 xj ∈si I(xj; xi) (2) where xi is a feature belonging to the subset s, si is the set s with the xi feature set aside and m is the number of components of s. Now since H(y|s) = H(y) − I(s, y) and Gs = 1 − Rs = GH (H(y|s)) we obtain that ˆRMRMR s = 1 − GH (H(y) − IMRMR(s, y)) is a MRMR estimator of the relevance Rs where GH (·) is the monotone mapping between H and Gini index. A model-based relevance estimation approach for feature selection in microarray datasets – p. 12/1
  • 13. Proposed relevance estimators We propose two novel relevance estimators based on the principle of averaging ˆRs = ˆRCV s + ˆRD s 2 , ˆRs = ˆRCV s + ˆRMRMR s 2 and the associated feature selection algorithms: sR = arg max s⊂x,|s|≤d ˆRs sR = arg max s⊂x,|s|≤d ˆRs A model-based relevance estimation approach for feature selection in microarray datasets – p. 13/1
  • 14. Experimental session • 20 public domain microarray expression datasets • external cross-validation scheme three-fold cross-validation strategy • to avoid any dependency between the learning algorithm employed by the wrapper and the classifier used for prediction, the experimental session is composed of two parts: • Part 1: comparison with the wrapper WSVM and we use the set of classifiers C1 ={TREE, NB, SVMSIGM, LDA, LOG} which does not include the SVMLIN learner, • Part 2: comparison with the wrapper WNB and we use the set of classifiers C2 ={TREE, SVMSIGM, SVMLIN, LDA, LOG} which does not include the NB learner. A model-based relevance estimation approach for feature selection in microarray datasets – p. 14/1
  • 15. Experiments with cancer datasets Name N n K Golub 72 7129 2 Alon 62 2000 2 Notterman 36 7457 2 Nutt 50 12625 2 Shipp 77 7129 2 Singh 102 12600 2 Sorlie 76 7937 2 Wang 286 22283 2 Van’t Veer 65 24481 2 VandeVijver 295 24496 2 Sotiriou 99 7650 2 Pomeroy 60 7129 2 Khan 63 2308 4 Hedenfalk 22 3226 3 West 49 7129 4 Staunton 60 7129 9 Su 174 12533 11 Bhattacharjee 203 12600 5 Armstrong 72 12582 3 Ma 60 22575 3 A model-based relevance estimation approach for feature selection in microarray datasets – p. 15/1
  • 16. Results 1st part Name R’ WSVM R” MRMR RANK Golub 0.0917 0.1177 0.1 0.1079 0.1225 Alon 0.2704 0.2658 0.2267 0.1996 0.2281 Notterman 0.1966 0.0985 0.1494 0.1472 0.1432 Nutt 0.3798 0.4171 0.3873 0.3847 0.4189 Shipp 0.1429 0.1319 0.1322 0.1362 0.1873 Singh 0.1619 0.1517 0.1266 0.1374 0.1328 Sorlie 0.3835 0.4314 0.3963 0.4004 0.3987 Wang 0.4282 0.4111 0.4218 0.4232 0.4181 Van’t Veer 0.2786 0.2638 0.2492 0.2217 0.2277 VandeVijver 0.454 0.4724 0.4365 0.4636 0.4482 Sotiriou 0.5279 0.5796 0.5351 0.5708 0.5339 Pomeroy 0.428 0.4191 0.4141 0.3876 0.4181 Khan 0.0878 0.1143 0.0582 0.0686 0.131 Hedenfalk 0.5475 0.5263 0.452 0.5273 0.5389 West 0.6463 0.6109 0.6186 0.5746 0.6109 Staunton 0.6822 0.71 0.6511 0.6865 0.7407 Su 0.2568 0.307 0.2549 0.3772 0.3352 Bhattacharjee 0.1232 0.1347 0.1105 0.1057 0.1515 Armstrong 0.1082 0.1199 0.1306 0.115 0.1122 Ma 0.2456 0.2041 0.2257 0.2413 0.2317 AVG 0.323 0.331 0.310 0.326 0.331 W/B than R’ (R”) 10/7 9/6 9/2 A model-based relevance estimation approach for feature selection in microarray datasets – p. 16/1
  • 17. Results 2nd part Name R’ WNB R” MRMR RANK Golub 0.0886 0.1114 0.0971 0.1019 0.0904 Alon 0.2376 0.2568 0.2181 0.2109 0.221 Notterman 0.1852 0.2059 0.1491 0.1512 0.1645 Nutt 0.3929 0.3402 0.36 0.3898 0.4258 Shipp 0.1261 0.127 0.1198 0.1338 0.1734 Singh 0.1495 0.1454 0.1297 0.1377 0.1245 Sorlie 0.3848 0.4254 0.3808 0.3953 0.3838 Wang 0.4363 0.4345 0.4298 0.4281 0.4255 Van’t Veer 0.2747 0.2715 0.2421 0.2253 0.2325 VandeVijver 0.4626 0.44 0.4763 0.4721 0.4358 Sotiriou 0.5126 0.5578 0.5505 0.5732 0.5611 Pomeroy 0.4367 0.4389 0.4007 0.3902 0.4224 Khan 0.0804 0.0896 0.0628 0.0631 0.0901 Hedenfalk 0.5379 0.5187 0.4369 0.4904 0.4949 West 0.6413 0.6696 0.5542 0.5882 0.6728 Staunton 0.6689 0.8298 0.6981 0.6661 0.83 Su 0.2544 0.3096 0.2646 0.3739 0.3529 Bhattacharjee 0.1235 0.1209 0.101 0.1061 0.1186 Armstrong 0.1079 0.1668 0.125 0.1148 0.1034 Ma 0.2565 0.2635 0.2335 0.2443 0.2681 AVG 0.322 0.3335 0.315 0.327 0.331 W/B than R’ (R”) 9/2 10/3 11/2 A model-based relevance estimation approach for feature selection in microarray datasets – p. 17/1
  • 18. Conclusions • Feature selection demands accurate estimation of relevance of subsets of features. • Wrapper methods use cross-validation estimation of misclassification error with generic learners. We show that this means a biased estimation of relevance. • The cross validation assessment ˆRkNN s returned by kNN techniques with low k provide a low bias yet high variance estimator of relevance. • Variance can be reduced by combining with other estimators. • Experiments on real datasets showed that the resulting relevance estimator can outperform both conventional wrapper and filter algorithms. A model-based relevance estimation approach for feature selection in microarray datasets – p. 18/1