Neural Networks for Conditional Probability Estimation Forecasting Beyond Point Predictions

Name: Neural Networks for Conditional Probability Estimation Forecasting Beyond Point Predictions
Price: 27.5 USD
Availability: InStock
ISBN: 9781852330958

ISBN-10: 1852330953

ISBN-13: 9781852330958

Edition: 1999

Authors: Dirk Husmeier, J. G. Taylor

List price: $99.00

30 day, 100% satisfaction guarantee!

Rent eBooks

90 day rental: $27.50 Expiration date: 1/7/2025

Marketplace

3 new & used from $51.61

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

This volume presents a neural network architecture for the prediction of conditional probability densities - which is vital when carrying out universal approximation on variables which are either strongly skewed or multimodal. Two alternative approaches are discussed: the GM network, in which all parameters are adapted in the training scheme, and the GM-RVFL model which draws on the random functional link net approach. Points of particular interest are: - it examines the modification to standard approaches needed for conditional probability prediction; - it provides the first real-world test results for recent theoretical findings about the relationship between generalisation performance of…

Book details

List price: $99.00
Copyright year: 1999
Publisher: Springer London, Limited
Publication date: 2/22/1999
Binding: Paperback
Pages: 275
Size: 6.25" wide x 9.50" long x 0.75" tall
Weight: 0.990
Language: English



List of Figures



Introduction



Conventional forecasting and Takens' embedding theorem



Implications of observational noise



Implications of dynamic noise



Example



Conclusion



Objective of this book



A Universal Approximator Network for Predicting Conditional Probability Densities



Introduction



A single-hidden-layer network



An additional hidden layer



Regaining the conditional probability density



Moments of the conditional probability density



Interpretation of the network parameters



Gaussian mixture model



Derivative-of-sigmoid versus Gaussian mixture model



Comparison with other approaches



Predicting local error bars



Indirect method



Complete kernel expansion: Conditional Density Estimation Network (CDEN) and Mixture Density Network (MDN)



Distorted Probability Mixture Network (DPMN)



Mixture of Experts (ME) and Hierarchical Mixture of Experts (HME)



Soft histogram



Summary



Appendix: The moment generating function for the DSM network



A Maximum Likelihood Training Scheme



The cost function



A gradient-descent training scheme



Output weights



Kernel widths



Remaining weights



Interpretation of the parameter adaptation rules



Deficiencies of gradient descent and their remedy



Summary



Appendix



Benchmark Problems



Logistic map with intrinsic noise



Stochastic combination of two stochastic dynamical systems



Brownian motion in a double-well potential



Summary



Demonstration of the Model Performance on the Benchmark Problems



Introduction



Logistic map with intrinsic noise



Method



Results



Stochastic coupling between two stochastic dynamical systems



Method



Results



Auto-pruning



Brownian motion in a double-well potential



Method



Results



Comparison with other approaches



Conclusions



Discussion



Random Vector Functional Link (RVFL) Networks



The RVFL theorem



Proof of the RVFL theorem



Comparison with the multilayer perceptron



A simple illustration



Summary



Improved Training Scheme Combining the Expectation Maximisation (EM) Algorithm with the RVFL Approach



Review of the Expectation Maximisation (EM) algorithm



Simulation: Application of the GM network trained with the EM algorithm



Method



Results



Discussion



Combining EM and RVFL



Preventing numerical instability



Regularisation



Summary



Appendix



Empirical Demonstration: Combining EM and RVFL



Method



Application of the GM-RVFL network to predicting the stochastic logistic-kappa map



Training a single model



Training an ensemble of models



Application of the GM-RVFL network to the double-well problem



Committee selection



Prediction



Comparison with other approaches



Discussion



A simple Bayesian regularisation scheme



A Bayesian approach to regularisation



A simple example: repeated coin flips



A conjugate prior



EM algorithm with regularisation



The posterior mode



Discussion



The Bayesian Evidence Scheme for Regularisation



Introduction



A simple illustration of the evidence idea



Overview of the evidence scheme



First step: Gaussian approximation to the probability in parameter space



Second step: Optimising the hyperparameters



A self-consistent iteration scheme



Implementation of the evidence scheme



First step: Gaussian approximation to the probability in parameter space



Second step: Optimising the hyperparameters



Algorithm



Discussion



Improvement over the maximum likelihood estimate



Justification of the approximations



Final remark



The Bayesian Evidence Scheme for Model Selection



The evidence for the model



An uninformative prior



Comparison with MacKay's work



Interpretation of the model evidence



Ockham factors for the weight groups



Ockham factors for the kernel widths



Ockham factor for the priors



Discussion



Demonstration of the Bayesian Evidence Scheme for Regularisation



Method and objective



Initialisation



Different training and regularisation schemes



Pruning



Large Data Set



Small Data Set



Number of well-determined parameters and pruning



Automatic self-pruning



Mathematical elucidation of the pruning scheme



Summary and Conclusion



Network Committees and Weighting Schemes



Network committees for interpolation



Network committees for modelling conditional probability densities



Weighting Schemes for Predictors



Introduction



A Bayesian approach



Numerical problems with the model evidence



A weighting scheme based on the cross-validation performance



Demonstration: Committees of Networks Trained with Different Regularisation Schemes



Method and objective



Single-model prediction



Committee prediction



Best and average single-model performance



Improvement over the average single-model performance



Improvement over the best single-model performance



Robustness of the committee performance



Dependence on the temperature



Dependence on the temperature when including biased models



Optimal temperature



Model selection and evidence



Advantage of under-regularisation and over-fitting



Conclusions



Automatic Relevance Determination (ARD)



Introduction



Two alternative ARD schemes



Mathematical implementation



Empirical demonstration



A Real-World Application: The Boston Housing Data



A real-world regression problem: The Boston house-price data



Prediction with a single model



Methodology



Results



Test of the ARD scheme



Methodology



Results



Prediction with network committees



Objective



Methodology



Weighting scheme and temperature



ARD parameters



Comparison between the two ARD schemes



Number of kernels



Bayesian regularisation



Network complexity



Cross-validation



Discussion: How overfitting can be useful



Increasing diversity



Bagging



Nonlinear Preprocessing



Comparison with Neal's results



Conclusions



Summary



Appendix: Derivation of the Hessian for the Bayesian Evidence Scheme



Introduction and notation



A decomposition of the Hessian using EM



Explicit calculation of the Hessian



Discussion


References


Index