| |
| |
List of Figures | |
| |
| |
| |
Introduction | |
| |
| |
| |
Conventional forecasting and Takens' embedding theorem | |
| |
| |
| |
Implications of observational noise | |
| |
| |
| |
Implications of dynamic noise | |
| |
| |
| |
Example | |
| |
| |
| |
Conclusion | |
| |
| |
| |
Objective of this book | |
| |
| |
| |
A Universal Approximator Network for Predicting Conditional Probability Densities | |
| |
| |
| |
Introduction | |
| |
| |
| |
A single-hidden-layer network | |
| |
| |
| |
An additional hidden layer | |
| |
| |
| |
Regaining the conditional probability density | |
| |
| |
| |
Moments of the conditional probability density | |
| |
| |
| |
Interpretation of the network parameters | |
| |
| |
| |
Gaussian mixture model | |
| |
| |
| |
Derivative-of-sigmoid versus Gaussian mixture model | |
| |
| |
| |
Comparison with other approaches | |
| |
| |
| |
Predicting local error bars | |
| |
| |
| |
Indirect method | |
| |
| |
| |
Complete kernel expansion: Conditional Density Estimation Network (CDEN) and Mixture Density Network (MDN) | |
| |
| |
| |
Distorted Probability Mixture Network (DPMN) | |
| |
| |
| |
Mixture of Experts (ME) and Hierarchical Mixture of Experts (HME) | |
| |
| |
| |
Soft histogram | |
| |
| |
| |
Summary | |
| |
| |
| |
Appendix: The moment generating function for the DSM network | |
| |
| |
| |
A Maximum Likelihood Training Scheme | |
| |
| |
| |
The cost function | |
| |
| |
| |
A gradient-descent training scheme | |
| |
| |
| |
Output weights | |
| |
| |
| |
Kernel widths | |
| |
| |
| |
Remaining weights | |
| |
| |
| |
Interpretation of the parameter adaptation rules | |
| |
| |
| |
Deficiencies of gradient descent and their remedy | |
| |
| |
| |
Summary | |
| |
| |
| |
Appendix | |
| |
| |
| |
Benchmark Problems | |
| |
| |
| |
Logistic map with intrinsic noise | |
| |
| |
| |
Stochastic combination of two stochastic dynamical systems | |
| |
| |
| |
Brownian motion in a double-well potential | |
| |
| |
| |
Summary | |
| |
| |
| |
Demonstration of the Model Performance on the Benchmark Problems | |
| |
| |
| |
Introduction | |
| |
| |
| |
Logistic map with intrinsic noise | |
| |
| |
| |
Method | |
| |
| |
| |
Results | |
| |
| |
| |
Stochastic coupling between two stochastic dynamical systems | |
| |
| |
| |
Method | |
| |
| |
| |
Results | |
| |
| |
| |
Auto-pruning | |
| |
| |
| |
Brownian motion in a double-well potential | |
| |
| |
| |
Method | |
| |
| |
| |
Results | |
| |
| |
| |
Comparison with other approaches | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Discussion | |
| |
| |
| |
Random Vector Functional Link (RVFL) Networks | |
| |
| |
| |
The RVFL theorem | |
| |
| |
| |
Proof of the RVFL theorem | |
| |
| |
| |
Comparison with the multilayer perceptron | |
| |
| |
| |
A simple illustration | |
| |
| |
| |
Summary | |
| |
| |
| |
Improved Training Scheme Combining the Expectation Maximisation (EM) Algorithm with the RVFL Approach | |
| |
| |
| |
Review of the Expectation Maximisation (EM) algorithm | |
| |
| |
| |
Simulation: Application of the GM network trained with the EM algorithm | |
| |
| |
| |
Method | |
| |
| |
| |
Results | |
| |
| |
| |
Discussion | |
| |
| |
| |
Combining EM and RVFL | |
| |
| |
| |
Preventing numerical instability | |
| |
| |
| |
Regularisation | |
| |
| |
| |
Summary | |
| |
| |
| |
Appendix | |
| |
| |
| |
Empirical Demonstration: Combining EM and RVFL | |
| |
| |
| |
Method | |
| |
| |
| |
Application of the GM-RVFL network to predicting the stochastic logistic-kappa map | |
| |
| |
| |
Training a single model | |
| |
| |
| |
Training an ensemble of models | |
| |
| |
| |
Application of the GM-RVFL network to the double-well problem | |
| |
| |
| |
Committee selection | |
| |
| |
| |
Prediction | |
| |
| |
| |
Comparison with other approaches | |
| |
| |
| |
Discussion | |
| |
| |
| |
A simple Bayesian regularisation scheme | |
| |
| |
| |
A Bayesian approach to regularisation | |
| |
| |
| |
A simple example: repeated coin flips | |
| |
| |
| |
A conjugate prior | |
| |
| |
| |
EM algorithm with regularisation | |
| |
| |
| |
The posterior mode | |
| |
| |
| |
Discussion | |
| |
| |
| |
The Bayesian Evidence Scheme for Regularisation | |
| |
| |
| |
Introduction | |
| |
| |
| |
A simple illustration of the evidence idea | |
| |
| |
| |
Overview of the evidence scheme | |
| |
| |
| |
First step: Gaussian approximation to the probability in parameter space | |
| |
| |
| |
Second step: Optimising the hyperparameters | |
| |
| |
| |
A self-consistent iteration scheme | |
| |
| |
| |
Implementation of the evidence scheme | |
| |
| |
| |
First step: Gaussian approximation to the probability in parameter space | |
| |
| |
| |
Second step: Optimising the hyperparameters | |
| |
| |
| |
Algorithm | |
| |
| |
| |
Discussion | |
| |
| |
| |
Improvement over the maximum likelihood estimate | |
| |
| |
| |
Justification of the approximations | |
| |
| |
| |
Final remark | |
| |
| |
| |
The Bayesian Evidence Scheme for Model Selection | |
| |
| |
| |
The evidence for the model | |
| |
| |
| |
An uninformative prior | |
| |
| |
| |
Comparison with MacKay's work | |
| |
| |
| |
Interpretation of the model evidence | |
| |
| |
| |
Ockham factors for the weight groups | |
| |
| |
| |
Ockham factors for the kernel widths | |
| |
| |
| |
Ockham factor for the priors | |
| |
| |
| |
Discussion | |
| |
| |
| |
Demonstration of the Bayesian Evidence Scheme for Regularisation | |
| |
| |
| |
Method and objective | |
| |
| |
| |
Initialisation | |
| |
| |
| |
Different training and regularisation schemes | |
| |
| |
| |
Pruning | |
| |
| |
| |
Large Data Set | |
| |
| |
| |
Small Data Set | |
| |
| |
| |
Number of well-determined parameters and pruning | |
| |
| |
| |
Automatic self-pruning | |
| |
| |
| |
Mathematical elucidation of the pruning scheme | |
| |
| |
| |
Summary and Conclusion | |
| |
| |
| |
Network Committees and Weighting Schemes | |
| |
| |
| |
Network committees for interpolation | |
| |
| |
| |
Network committees for modelling conditional probability densities | |
| |
| |
| |
Weighting Schemes for Predictors | |
| |
| |
| |
Introduction | |
| |
| |
| |
A Bayesian approach | |
| |
| |
| |
Numerical problems with the model evidence | |
| |
| |
| |
A weighting scheme based on the cross-validation performance | |
| |
| |
| |
Demonstration: Committees of Networks Trained with Different Regularisation Schemes | |
| |
| |
| |
Method and objective | |
| |
| |
| |
Single-model prediction | |
| |
| |
| |
Committee prediction | |
| |
| |
| |
Best and average single-model performance | |
| |
| |
| |
Improvement over the average single-model performance | |
| |
| |
| |
Improvement over the best single-model performance | |
| |
| |
| |
Robustness of the committee performance | |
| |
| |
| |
Dependence on the temperature | |
| |
| |
| |
Dependence on the temperature when including biased models | |
| |
| |
| |
Optimal temperature | |
| |
| |
| |
Model selection and evidence | |
| |
| |
| |
Advantage of under-regularisation and over-fitting | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Automatic Relevance Determination (ARD) | |
| |
| |
| |
Introduction | |
| |
| |
| |
Two alternative ARD schemes | |
| |
| |
| |
Mathematical implementation | |
| |
| |
| |
Empirical demonstration | |
| |
| |
| |
A Real-World Application: The Boston Housing Data | |
| |
| |
| |
A real-world regression problem: The Boston house-price data | |
| |
| |
| |
Prediction with a single model | |
| |
| |
| |
Methodology | |
| |
| |
| |
Results | |
| |
| |
| |
Test of the ARD scheme | |
| |
| |
| |
Methodology | |
| |
| |
| |
Results | |
| |
| |
| |
Prediction with network committees | |
| |
| |
| |
Objective | |
| |
| |
| |
Methodology | |
| |
| |
| |
Weighting scheme and temperature | |
| |
| |
| |
ARD parameters | |
| |
| |
| |
Comparison between the two ARD schemes | |
| |
| |
| |
Number of kernels | |
| |
| |
| |
Bayesian regularisation | |
| |
| |
| |
Network complexity | |
| |
| |
| |
Cross-validation | |
| |
| |
| |
Discussion: How overfitting can be useful | |
| |
| |
| |
Increasing diversity | |
| |
| |
| |
Bagging | |
| |
| |
| |
Nonlinear Preprocessing | |
| |
| |
| |
Comparison with Neal's results | |
| |
| |
| |
Conclusions | |
| |
| |
| |
Summary | |
| |
| |
| |
Appendix: Derivation of the Hessian for the Bayesian Evidence Scheme | |
| |
| |
| |
Introduction and notation | |
| |
| |
| |
A decomposition of the Hessian using EM | |
| |
| |
| |
Explicit calculation of the Hessian | |
| |
| |
| |
Discussion | |
| |
| |
References | |
| |
| |
Index | |