Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Probabilistic Machine Learning: Advantages of Using Probabilistic Models

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Machine learning research

Page 1 of 3

Probabilistic machine learning


Unlike other methods probabilistic machine learning is based on one consistent principle which is used throughout the entire inference procedure. Probabilistic methods approach inference of latent variables, model coefficients, nuisance paramaters and model order esentially by applying Bayesian Theory. Hence we may treat all unkown variables identically which is mathematically nice. For computational reasons a fully probabilistic model might not be feasible. In such situations we have to use approximations. Obviously for an emprirical methodology, a mathematical consistency argument is not too convincing. So why should one use probabilistic methods?

Advantages of using probabilistic models


Fully probabilistic models avoid "black box" characteristics. We may instantiate arbitrary sets of variables and for diagnosis purpouses infer the distributions over (or expectations of) other variables of interst and thus obtain some insight how we obtain a particular decision. Using a probabilistic model is relatively easy. Inference (if properly implemented) should be insensitive to the setting of all "fiddle parameters" and will thus provide results that are close to optimal. An example that relies on this property is adaptive inference which I have used for our adaptive BCI. Probabilistic models provide means for intelligent sensor fusion which allows e.g. to combine information that is known with different certainty.

Disadvantages of using probabilistic models


Inference of fully probabilistic models can be slow. Infering probabilistic models inevitably requires to model distributions. Sometimes this can be more than we are asked for.

Bayesian sensor fusion


was the topic of a talk I gave recently at the NCRG in Aston, with a pdf version of the slides beeing available here. I want to thank Peter Tino for the invitation and all atendees for their feedback.

The basic idea of Bayesian sensor fusion is to take uncertainty of information into account. In machine learning the seminal papers were those by (MacKay 1992) who discussed the effects of model uncertainty. In (Wright 1999) these ideas were later extendend to input uncertanty. Very much related is the work by (Dellaportas & Stephens 1995) who discuss models for errors in observeables. I got interested in these issues in the context of hierarchical models where model parameters of a feature extraction stage are used for diagnosis or segmentation purpouses. Such models are e.g. used for sleep analsysis or also in the context of BCI. In a Bayesian sense these features are latent variables and should be treated as such. Again this is a consistency argument which has to be examined for its practical relevance. In order to obtain a hierarchical model that does sensor fusion we simply regard the feature extraction stage as latent space and integrate (marginalize) over all uncertainties. The left figure compares a sensor fusing DAG with current practice in many applications of

http://www.robots.ox.ac.uk/~psyk/research.html

3/8/2011

Machine learning research

Page 2 of 3

probabilistic modeling that regard extracted features as observations. I reported on a first attempt to approach this problem in (Sykacek 1999) which is in more detail described in section 4 in my Ph.D. thesis. In order to see that a latent feature space has practical advantages, we consider a very simple case where two sensors are fused in a naive Bayes manner to predict the posterior probability in a two class problem. The model is similar to the one in the graph used above, however with two latent feature stages that are, conditional on the state of interest t, assumed to be independent. The plot on the right illustrates the effect of knowing one of the latent features with varying precision. Conditioning on a best estimate results obviously in probabilities that are independent of the precision. We hence obtain a flat line at probability P(2)=0.27. Marginalization modifies the probabilities wich can, as we see, also modify the predicted state. We may thus expect to obtain improvements in cases where the precision of the distributions in the latent feature space varies. We have successfully applied a HMM based latent feature space model to classification of segments of multi sensor time series. Such problems arise in clinical diagnosis (sleep classification) and in the context of brain computer interfaces (BCI). A MCMC implementation and evaluation on synthetic and BCI data has been published in (Sykacek & Roberts 2002a). Recently (Beal etal 2002) have applied similar ideas to sensor fusion of audio and video data.

References
(MacKay 1992) D. J. C. MacKay. Bayesian interpolation. in Neural Computation, pages 415-447, 1992. (Bernardo & Smith 1994) J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley & Sons, Chichester UK, 1994. (Dellaportas & Stephens 1995) P. Dellaportas and S. A. Stephens. Bayesian analysis of errors-in-variables regression models. Biometrics, 51:1085-1095, 1995. (Wright 1999) W. A. Wright. Bayesian approach to Neural-Network modeling with input uncertainty. IEEE Trans. Neural Networks, 10:1261-1270, 1999. (Sykacek 1999) Learning from uncertain and probably missing data. Peter Sykacek, OEFAI. An abstract is available on the workshop homepage. (Beal etal. 2002) M. J. Beal, H. Attias and N. Jojic. A Self-Calibrating Algorithm for Speaker Tracking Based on Audio-Visual Statistical Models. In Proc. Int. Conf. on Acoustics Speech and Signal Proc.

http://www.robots.ox.ac.uk/~psyk/research.html

3/8/2011

Machine learning research

Page 3 of 3

(ICASSP), May 2002.

http://www.robots.ox.ac.uk/~psyk/research.html

3/8/2011

You might also like