Probabilistic Machine Learning: Advantages of Using Probabilistic Models
Probabilistic Machine Learning: Advantages of Using Probabilistic Models
Probabilistic Machine Learning: Advantages of Using Probabilistic Models
Page 1 of 3
The basic idea of Bayesian sensor fusion is to take uncertainty of information into account. In machine learning the seminal papers were those by (MacKay 1992) who discussed the effects of model uncertainty. In (Wright 1999) these ideas were later extendend to input uncertanty. Very much related is the work by (Dellaportas & Stephens 1995) who discuss models for errors in observeables. I got interested in these issues in the context of hierarchical models where model parameters of a feature extraction stage are used for diagnosis or segmentation purpouses. Such models are e.g. used for sleep analsysis or also in the context of BCI. In a Bayesian sense these features are latent variables and should be treated as such. Again this is a consistency argument which has to be examined for its practical relevance. In order to obtain a hierarchical model that does sensor fusion we simply regard the feature extraction stage as latent space and integrate (marginalize) over all uncertainties. The left figure compares a sensor fusing DAG with current practice in many applications of
http://www.robots.ox.ac.uk/~psyk/research.html
3/8/2011
Page 2 of 3
probabilistic modeling that regard extracted features as observations. I reported on a first attempt to approach this problem in (Sykacek 1999) which is in more detail described in section 4 in my Ph.D. thesis. In order to see that a latent feature space has practical advantages, we consider a very simple case where two sensors are fused in a naive Bayes manner to predict the posterior probability in a two class problem. The model is similar to the one in the graph used above, however with two latent feature stages that are, conditional on the state of interest t, assumed to be independent. The plot on the right illustrates the effect of knowing one of the latent features with varying precision. Conditioning on a best estimate results obviously in probabilities that are independent of the precision. We hence obtain a flat line at probability P(2)=0.27. Marginalization modifies the probabilities wich can, as we see, also modify the predicted state. We may thus expect to obtain improvements in cases where the precision of the distributions in the latent feature space varies. We have successfully applied a HMM based latent feature space model to classification of segments of multi sensor time series. Such problems arise in clinical diagnosis (sleep classification) and in the context of brain computer interfaces (BCI). A MCMC implementation and evaluation on synthetic and BCI data has been published in (Sykacek & Roberts 2002a). Recently (Beal etal 2002) have applied similar ideas to sensor fusion of audio and video data.
References
(MacKay 1992) D. J. C. MacKay. Bayesian interpolation. in Neural Computation, pages 415-447, 1992. (Bernardo & Smith 1994) J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley & Sons, Chichester UK, 1994. (Dellaportas & Stephens 1995) P. Dellaportas and S. A. Stephens. Bayesian analysis of errors-in-variables regression models. Biometrics, 51:1085-1095, 1995. (Wright 1999) W. A. Wright. Bayesian approach to Neural-Network modeling with input uncertainty. IEEE Trans. Neural Networks, 10:1261-1270, 1999. (Sykacek 1999) Learning from uncertain and probably missing data. Peter Sykacek, OEFAI. An abstract is available on the workshop homepage. (Beal etal. 2002) M. J. Beal, H. Attias and N. Jojic. A Self-Calibrating Algorithm for Speaker Tracking Based on Audio-Visual Statistical Models. In Proc. Int. Conf. on Acoustics Speech and Signal Proc.
http://www.robots.ox.ac.uk/~psyk/research.html
3/8/2011
Page 3 of 3
http://www.robots.ox.ac.uk/~psyk/research.html
3/8/2011