[PDF][PDF] Data splitting
Z Reitermanova - WDS, 2010 - physics.mff.cuni.cz
WDS, 2010•physics.mff.cuni.cz
In machine learning, one of the main requirements is to build computational models with a
high ability to generalize well the extracted knowledge. When training eg artificial neural
networks, poor generalization is often characterized by over-training. A common method to
avoid over-training is the hold-out crossvalidation. The basic problem of this method
represents, however, appropriate data splitting. In most of the applications, simple random
sampling is used. Nevertheless, there are several sophisticated statistical sampling methods …
high ability to generalize well the extracted knowledge. When training eg artificial neural
networks, poor generalization is often characterized by over-training. A common method to
avoid over-training is the hold-out crossvalidation. The basic problem of this method
represents, however, appropriate data splitting. In most of the applications, simple random
sampling is used. Nevertheless, there are several sophisticated statistical sampling methods …
Abstract
In machine learning, one of the main requirements is to build computational models with a high ability to generalize well the extracted knowledge. When training eg artificial neural networks, poor generalization is often characterized by over-training. A common method to avoid over-training is the hold-out crossvalidation. The basic problem of this method represents, however, appropriate data splitting. In most of the applications, simple random sampling is used. Nevertheless, there are several sophisticated statistical sampling methods suitable for various types of datasets.
physics.mff.cuni.cz