Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1953563.1953567acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article

Purifying data by machine learning with certainty levels

Published: 25 July 2010 Publication History

Abstract

A fundamental paradigm used for autonomic computing, self-managing systems, and decision-making under uncertainty and faults is machine learning. Machine learning uses a data-set, or a set of data-items. A data-item is a vector of feature values and a classification. Occasionally these data sets include misleading data items that were either introduced by input device malfunctions, or were maliciously inserted to lead the machine learning to wrong conclusions. A reliable learning algorithm must be able to handle a corrupted data-set. Otherwise, an adversary (or simply a malfunctioning input device that corrupts a portion of the data-set) may lead to inaccurate classifications. Therefore, the challenge is to find effective methods to evaluate and increase the certainty level of the learning process as much as possible. This paper introduces the use of a certainty level measure to obtain better classification capability in the presence of corrupted data items. Assuming a known data distribution (e.g., a normal distribution) and/or a known upper bound on the given number of corrupted data items, our techniques define a certainty level for classifications. Another approach suggests enhancing the random forest techniques to cope with corrupted data items by augmenting the certainty level for the classification obtained in each leaf in the forest. This method is of independent interest, that of significantly improving the classification of the random forest machine learning technique in less severe settings.

References

[1]
Aslam, J., Decatur, S.: Specification and simulation of statistical query algorithms for efficiency and noise tolerance. J. Comput. Syst. Sci. 56, 191--2087 (1998)
[2]
Auer, P.: Learning nested differences in the presence of malicious noise. Theoretical Computer Science 185(1), 159--175 (1997)
[3]
Auer, P., Cesa-Bianchi, N.: On-line learning with malicious noise and the closure algorithm, Ann. Math. and Artif. Intel. 23, 83--99 (1998)
[4]
Berikov, V., Litvinenko, A.: Methods for statistical data analysis with decision tree, Novosibirsk Sobolev Institute of Mathematics, (2003)
[5]
Breiman, L.: Random forests, Statistics department, Technical report, University of California, Berkeley (1999)
[6]
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J.: Classification and Regression Trees, hapman & Hall, Boca Raton (1993)
[7]
Cesa-Bianchi, N., Dichterman, E., Fischer, P., Shamir, E., Simon, U. H.: ample-efficient strategies for learning in the presence of noise,. ACM 46(5), 684--719 (1999)
[8]
Decatur, S.: Statistical queries and faulty PAC oracles, Proc. Sixth Work. on Comp. Learning Theory, 262--268 (1993)
[9]
Dolev, S., Leshem, G., Yagel, R.: Purifying Data by Machine Learning with Certainty Levels, Technical Report August 2009, Dept. of Computer Science, Ben-Gurion University of the Negev (TR-09-06)
[10]
Kearns, M., Li, M.: Learning in the presence of malicious errors, SIAM J. Comput. 22(4), 807--837 (1993)
[11]
Mansour, Y., Parnas, M.: Learning conjunctions with noise under product distributions, Inf. Proc. Let. 68(4), 189--196 (1998)
[12]
Mitchell, T. M.: Machine Learning, McGraw-Hill (1997)
[13]
Quinlan, J. R.: C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers (1993)
[14]
Quinlan, J. R.: Induction of Decision Trees, Machine Learning (1986)
[15]
Servedio, A. R.: Smooth boosting and learning with malicious noise, Journal of Machine Learning Research (4), 633--648 (2003)
[16]
Valiant, G. L.: A theory of the learnable, Communications of the ACM 27(11), 1134--1142 (1984)

Cited By

View all
  • (2023)Purifying Data by Machine Learning with Certainty LevelsData Analysis and Optimization10.1007/978-3-031-31654-8_6(89-102)Online publication date: 24-Sep-2023
  • (2014)Reputation Prediction of Anomaly Detection Algorithms for Reliable SystemProceedings of the 2014 IEEE International Conference on Software Science, Technology and Engineering10.1109/SWSTE.2014.15(19-23)Online publication date: 11-Jun-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WRAS '10: Proceedings of the Third International Workshop on Reliability, Availability, and Security
July 2010
62 pages
ISBN:9781450306423
DOI:10.1145/1953563
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PAC learning
  2. certainty level
  3. data corruption
  4. machine learning

Qualifiers

  • Research-article

Conference

PODC '10
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Purifying Data by Machine Learning with Certainty LevelsData Analysis and Optimization10.1007/978-3-031-31654-8_6(89-102)Online publication date: 24-Sep-2023
  • (2014)Reputation Prediction of Anomaly Detection Algorithms for Reliable SystemProceedings of the 2014 IEEE International Conference on Software Science, Technology and Engineering10.1109/SWSTE.2014.15(19-23)Online publication date: 11-Jun-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media