On the Robustness of Decision Tree Learning Under Label Noise

Ghosh, Aritra; Manwani, Naresh; Sastry, P. S.

doi:10.1007/978-3-319-57454-7_53

Aritra Ghosh¹⁹,
Naresh Manwani²⁰ &
P. S. Sastry²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10234))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4134 Accesses
22 Citations
1 Altmetric

Abstract

In most practical problems of classifier learning, the training data suffers from label noise. Most theoretical results on robustness to label noise involve either estimation of noise rates or non-convex optimization. Further, none of these results are applicable to standard decision tree learning algorithms. This paper presents some theoretical analysis to show that, under some assumptions, many popular decision tree learning algorithms are inherently robust to label noise. We also present some sample complexity results which provide some bounds on the sample size for the robustness to hold with a high probability. Through extensive simulations we illustrate this robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hypothesis Testing for Class-Conditional Label Noise

Inference Problem in Probabilistic Multi-label Classification

Robust optimal classification trees under noisy labels

Article Open access 05 October 2021

Notes

1.
For simplicity, we do not consider pruning of the tree.

References

Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
MATH Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)
Article Google Scholar
du Plessis, M.C., Niu, G, Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)
Article Google Scholar
Ghosh, A., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)
Article Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)
Article MathSciNet Google Scholar
Manwani, N., Sastry, P.S.: Geometric decision tree. IEEE Trans. Syst. Man Cybern. 42(1), 181–192 (2012)
Article Google Scholar
Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)
Article Google Scholar
Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010)
Article Google Scholar
Patrini, G., Nielsen, F., Nock, R., Carioni, M.: Loss factorization, weakly supervised learning and label noise robustness. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 708–717 (2016)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Scott, C., Blanchard, G., Handy, G.: Classification with asymmetric label noise: consistency and maximal denoising. In: The 26th Annual Conference on Learning Theory, 12–14 June 2013, pp. 489–511 (2013)
Google Scholar
van Rooyen, B., Menon, A., Williamson, R.C.: Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18 (2015)
Google Scholar
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft, Bangalore, India
Aritra Ghosh
International Institute of Information Technology, Hyderabad, India
Naresh Manwani
Indian Institute of Science, Bangalore, India
P. S. Sastry

Authors

Aritra Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Naresh Manwani
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. S. Sastry .

Editor information

Editors and Affiliations

Kangwon National University, Chuncheon, Korea (Republic of)
Jinho Kim
Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
University of New South Wales, Sydney, New South Wales, Australia
Xuemin Lin
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

A Proof Sketch of Lemmas 1, 2

Let $n^+ ({\tilde{n}}^+)$ and $n^- ({\tilde{n}}^- )$ denote the positive and negative samples at the node under noise-free case (noisy case). Taking positive class as majority, we note $\rho = (n^+ - n^-)/n$. Using Hoeffding bound it is easy to show $\Pr [{\tilde{n}}^+ - {\tilde{n}}^- < 0]\le \exp \left( -\frac{\rho ^2 n (1 - 2 \eta )^2}{2}\right) $. This gives bound for samples needed as $n > \frac{2}{\rho ^2 (1 - 2 \eta )^2} \ln (\frac{1}{\delta })$, completing proof of Lemma 1.

Let $n, n_l, n_r$ be the number of samples at $v, v_l, v_r$ and recall $n_l=an$ and $n_r=(1-a)n$. Recall that ${\tilde{p}}, {\tilde{p}}_l, {\tilde{p}}_r$ are fraction of positive samples at $v, v_l, v_r$ and $p^{\eta }, p^{\eta }_l, p^{\eta }_r$ are their large sample values. Then, using Hoeffding bounds we get (with $\epsilon _1=\epsilon $, $\epsilon _2=\epsilon /\sqrt{a}$ and $\epsilon _3=\epsilon /\sqrt{1-a}$),

$$\begin{aligned} \Pr \left[ \big (|\tilde{p}-p^{\eta }|\ge \epsilon _1 \big ) \cup \big (|\tilde{p}_l-p_l^{\eta }|\ge \epsilon _2\big ) \cup \big (|\tilde{p}_r-p^{\eta }_r|\ge \epsilon _3\big )\right] \le 6e^{-2n\epsilon ^2} \end{aligned}$$

(5)

When this event happens, with some algebraic manipulation, one can show for Gini impurity, $|\hat{\text {gain}}_{\text {Gini}}^{\eta }(f)-\text {gain}_{\text {Gini}}^{\eta }(f)|\le 6(1-2\eta )\epsilon $ where $\hat{\text {gain}}_{\text {Gini}}^{\eta }$ is the random Gini-gain under noise with sample size n and $\text {gain}_{\text {Gini}}^{\eta }$ is its large sample limit. This gives us the bound as needed in Lemma 2. We can prove the lemma for other criteria also similarly.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghosh, A., Manwani, N., Sastry, P.S. (2017). On the Robustness of Decision Tree Learning Under Label Noise. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-57454-7_53
Published: 23 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57453-0
Online ISBN: 978-3-319-57454-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Robustness of Decision Tree Learning Under Label Noise

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hypothesis Testing for Class-Conditional Label Noise

Inference Problem in Probabilistic Multi-label Classification

Robust optimal classification trees under noisy labels

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof Sketch of Lemmas 1, 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Robustness of Decision Tree Learning Under Label Noise

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hypothesis Testing for Class-Conditional Label Noise

Inference Problem in Probabilistic Multi-label Classification

Robust optimal classification trees under noisy labels

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof Sketch of Lemmas 1, 2

A Proof Sketch of Lemmas 1, 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation