Google Scholar

[PDF][PDF] Data splitting

Z Reitermanova - WDS, 2010 - physics.mff.cuni.cz

WDS, 2010•physics.mff.cuni.cz

In machine learning, one of the main requirements is to build computational models with a
high ability to generalize well the extracted knowledge. When training eg artificial neural
networks, poor generalization is often characterized by over-training. A common method to
avoid over-training is the hold-out crossvalidation. The basic problem of this method
represents, however, appropriate data splitting. In most of the applications, simple random
sampling is used. Nevertheless, there are several sophisticated statistical sampling methods …

Abstract

In machine learning, one of the main requirements is to build computational models with a high ability to generalize well the extracted knowledge. When training eg artificial neural networks, poor generalization is often characterized by over-training. A common method to avoid over-training is the hold-out crossvalidation. The basic problem of this method represents, however, appropriate data splitting. In most of the applications, simple random sampling is used. Nevertheless, there are several sophisticated statistical sampling methods suitable for various types of datasets.

physics.mff.cuni.cz

Show moreShow less

Save Cite Cited by 411 Related articles All 3 versions View as HTML

Cite

Advanced search

Saved to My library

[PDF][PDF] Data splitting