Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Neural Networks for Conditional Probability Estimation Forecasting Beyond Point Predictions

ISBN-10: 1852330953

ISBN-13: 9781852330958

Edition: 1999

Authors: Dirk Husmeier, J. G. Taylor

List price: $99.00
Blue ribbon 30 day, 100% satisfaction guarantee!
Rent eBooks
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

This volume presents a neural network architecture for the prediction of conditional probability densities - which is vital when carrying out universal approximation on variables which are either strongly skewed or multimodal. Two alternative approaches are discussed: the GM network, in which all parameters are adapted in the training scheme, and the GM-RVFL model which draws on the random functional link net approach. Points of particular interest are: - it examines the modification to standard approaches needed for conditional probability prediction; - it provides the first real-world test results for recent theoretical findings about the relationship between generalisation performance of…    
Customers also bought

Book details

List price: $99.00
Copyright year: 1999
Publisher: Springer London, Limited
Publication date: 2/22/1999
Binding: Paperback
Pages: 275
Size: 6.25" wide x 9.50" long x 0.75" tall
Weight: 0.990
Language: English

List of Figures
Introduction
Conventional forecasting and Takens' embedding theorem
Implications of observational noise
Implications of dynamic noise
Example
Conclusion
Objective of this book
A Universal Approximator Network for Predicting Conditional Probability Densities
Introduction
A single-hidden-layer network
An additional hidden layer
Regaining the conditional probability density
Moments of the conditional probability density
Interpretation of the network parameters
Gaussian mixture model
Derivative-of-sigmoid versus Gaussian mixture model
Comparison with other approaches
Predicting local error bars
Indirect method
Complete kernel expansion: Conditional Density Estimation Network (CDEN) and Mixture Density Network (MDN)
Distorted Probability Mixture Network (DPMN)
Mixture of Experts (ME) and Hierarchical Mixture of Experts (HME)
Soft histogram
Summary
Appendix: The moment generating function for the DSM network
A Maximum Likelihood Training Scheme
The cost function
A gradient-descent training scheme
Output weights
Kernel widths
Remaining weights
Interpretation of the parameter adaptation rules
Deficiencies of gradient descent and their remedy
Summary
Appendix
Benchmark Problems
Logistic map with intrinsic noise
Stochastic combination of two stochastic dynamical systems
Brownian motion in a double-well potential
Summary
Demonstration of the Model Performance on the Benchmark Problems
Introduction
Logistic map with intrinsic noise
Method
Results
Stochastic coupling between two stochastic dynamical systems
Method
Results
Auto-pruning
Brownian motion in a double-well potential
Method
Results
Comparison with other approaches
Conclusions
Discussion
Random Vector Functional Link (RVFL) Networks
The RVFL theorem
Proof of the RVFL theorem
Comparison with the multilayer perceptron
A simple illustration
Summary
Improved Training Scheme Combining the Expectation Maximisation (EM) Algorithm with the RVFL Approach
Review of the Expectation Maximisation (EM) algorithm
Simulation: Application of the GM network trained with the EM algorithm
Method
Results
Discussion
Combining EM and RVFL
Preventing numerical instability
Regularisation
Summary
Appendix
Empirical Demonstration: Combining EM and RVFL
Method
Application of the GM-RVFL network to predicting the stochastic logistic-kappa map
Training a single model
Training an ensemble of models
Application of the GM-RVFL network to the double-well problem
Committee selection
Prediction
Comparison with other approaches
Discussion
A simple Bayesian regularisation scheme
A Bayesian approach to regularisation
A simple example: repeated coin flips
A conjugate prior
EM algorithm with regularisation
The posterior mode
Discussion
The Bayesian Evidence Scheme for Regularisation
Introduction
A simple illustration of the evidence idea
Overview of the evidence scheme
First step: Gaussian approximation to the probability in parameter space
Second step: Optimising the hyperparameters
A self-consistent iteration scheme
Implementation of the evidence scheme
First step: Gaussian approximation to the probability in parameter space
Second step: Optimising the hyperparameters
Algorithm
Discussion
Improvement over the maximum likelihood estimate
Justification of the approximations
Final remark
The Bayesian Evidence Scheme for Model Selection
The evidence for the model
An uninformative prior
Comparison with MacKay's work
Interpretation of the model evidence
Ockham factors for the weight groups
Ockham factors for the kernel widths
Ockham factor for the priors
Discussion
Demonstration of the Bayesian Evidence Scheme for Regularisation
Method and objective
Initialisation
Different training and regularisation schemes
Pruning
Large Data Set
Small Data Set
Number of well-determined parameters and pruning
Automatic self-pruning
Mathematical elucidation of the pruning scheme
Summary and Conclusion
Network Committees and Weighting Schemes
Network committees for interpolation
Network committees for modelling conditional probability densities
Weighting Schemes for Predictors
Introduction
A Bayesian approach
Numerical problems with the model evidence
A weighting scheme based on the cross-validation performance
Demonstration: Committees of Networks Trained with Different Regularisation Schemes
Method and objective
Single-model prediction
Committee prediction
Best and average single-model performance
Improvement over the average single-model performance
Improvement over the best single-model performance
Robustness of the committee performance
Dependence on the temperature
Dependence on the temperature when including biased models
Optimal temperature
Model selection and evidence
Advantage of under-regularisation and over-fitting
Conclusions
Automatic Relevance Determination (ARD)
Introduction
Two alternative ARD schemes
Mathematical implementation
Empirical demonstration
A Real-World Application: The Boston Housing Data
A real-world regression problem: The Boston house-price data
Prediction with a single model
Methodology
Results
Test of the ARD scheme
Methodology
Results
Prediction with network committees
Objective
Methodology
Weighting scheme and temperature
ARD parameters
Comparison between the two ARD schemes
Number of kernels
Bayesian regularisation
Network complexity
Cross-validation
Discussion: How overfitting can be useful
Increasing diversity
Bagging
Nonlinear Preprocessing
Comparison with Neal's results
Conclusions
Summary
Appendix: Derivation of the Hessian for the Bayesian Evidence Scheme
Introduction and notation
A decomposition of the Hessian using EM
Explicit calculation of the Hessian
Discussion
References
Index