Abstract
The unsupervised learning of multivariate mixture models from on-line data streams has attracted the attention of researchers for its usefulness in real-time intelligent learning systems. The EM algorithm is an ideal choice for iteratively obtaining maximum likelihood estimation of parameters in presumable finite mixtures, comparing to some popular numerical methods. However, the original EM is a batch algorithm that works only on fixed datasets. To endow the EM algorithm with the capability to process streaming data, two on-line variants are studied, including Titterington’s method and a sufficient statistics-based method. We first prove that the two on-line EM variants are theoretically feasible for training the multivariate normal mixture model by showing that the model belongs to the exponential family. Afterward, the two on-line learning schemes for multivariate normal mixtures are applied to the problems of background learning and moving foreground detection. Experiments show that the two on-line EM variants can efficiently update the parameters of the mixture model and are capable of generating reliable backgrounds for moving foreground detection.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 39(1), 1–38 (1977)
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Titterington, D.M.: Recursive parameter estimation using incomplete data. J. R. Stat. Soc., Ser. B, Methodol. 46(2), 257–267 (1984)
Fabian, V.: On asymptotically efficient recursive estimation. Ann. Stat. 6, 854–866 (1978)
Zivkovic, Z., van der Heijden, F.: Recursive unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 651–656 (2004)
Neal, R., Hinton, G.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer Academic, Dordrecht (1998)
Cappe, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc., Ser. B, Methodol. 71, 593–613 (2009)
Arcidiacono, P., Jones, J.B.: Finite mixture distributions, sequential likelihood and the EM algorithm. Econometrica 71(3), 933–946 (2003)
Xu, L., Jordan, M.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26, 195–239 (1984)
Sato, M., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12, 407–432 (2000)
Markov, U.E., Smith, A.F.M.: A quasi-Bayes unsupervised learning procedure for priors. IEEE Trans. Inf. Theory 23(6), 761–764 (1977)
Smith, A.F.M., Markov, U.E.: A quasi-Bayes sequential procedure for mixtures. J. R. Stat. Soc., Ser. B, Methodol. 40, 106–112 (1978)
Wang, S., Zhao, Y.: Almost sure convergence of Titterington’s recursive estimator for mixture models. Stat. Probab. Lett. 76, 2001–2006 (2006)
Wren, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 780–785 (1997)
Stauffer, C., Grimson, W.E.: Adaptive background mixture models for realtime tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 246–252 (1999)
Stauffer, C., Grimson, W.E.: Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 747–757 (2000)
Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: Proc. 17th Int. Conf. Pattern Recognition, pp. 28–31 (2004)
Li, D., Xu, L., Goodman, E.: On-line background learning for illumination-robust foreground detection. In: Proc. 11th ICARCV, pp. 1093–1100 (2010)
Available at: http://www.cvg.rdg.ac.uk/slides/pets.html
Goyette, N., Jodoin, P.-M., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: Proc. IEEE Workshop on Change Detection (CDW’12) at CVPR’12, pp. 1–8 (2012)
Cheng, J., Yang, J., Zhou, Y., Cui, Y.: Flexible background mixture models for foreground segmentation. Image Vis. Comput. 24, 473–482 (2006)
Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.: Real-time foreground-background segmentation using codebook model. Real-Time Imaging 11(3), 167–256 (2005)
Chang, F., Chen, C., Lu, C.: A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93(2), 206–220 (2004)
Joo, S., Chellappa, R.: A multiple-hypothesis approach for multiobject visual tracking. IEEE Trans. Image Process. 16(11), 2849–2854 (2007)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of Theorem 1
The complete-data density (11) can be transformed into:
Decomposing (45) into the following setting:
Thus, according to Definition 2, it is obvious that f(x|θ) defined in (11) belongs to the exponential family.
Appendix B: The Conditional Expectation of Score Functions
According to the definition of conditional expectation:
The derivative with respect to θ is independent of the integral on y, so they are exchangeable, and we have:
Appendix C: Proof of Theorem 2
According to (11), the logarithm of the complete-data density is:
The parameters to be estimated are ω j , μ j and C j , which are mutually independent. Therefore the FIM of θ has diagonal form, and we are going to derive the components of I c (θ) separately in the following three parts.
3.1 C.1 Derivation of Titterington’s Equation for ω j
To compute I c (ω), without loss of generality, we take ω 1 for example. By using the statistical constraint \(\sum_{j = 1}^{K} \omega_{j} = 1\) and abbreviating \(p_{j}( \mathbf{y|\theta}_{j} )\) to p j , Eq. (52) can be rewritten into two forms:
and
By taking derivative of ω 1 on (53) and (54), we obtain:
The Kronecker Delta δ j is a discrete random variable with the distribution listed in Table 2.
From Table 2 we have \(\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\omega_{1}) ] = 0\), which satisfies the property of scores in (17). Then I c (ω 1) is constructed as:
Considering the generality of (53), (54) and (55), we have:
From (56) we can see I c (ω j ) is a scalar. In order to obtain the incomplete-data score function v g (y,ω j ), we modify the incomplete-data density g(y|θ) by adding a zero term which is analogous to the way we construct a Lagrange function; this new modified incomplete-data density is denoted by g s (y|θ), and its value is always equal to g(y|θ):
Then it can be derived that:
According to the properties of a probability density function, the expectation of (58) is computed as:
which also satisfies (17). Now by introducing (56) and (58) to the recursive Eq. (13), we obtain the updating equation for ω j formulated in (21).
3.2 C.2 Derivation of Titterington’s Equation for μ j
For updating the mean vectors, I c (μ) should be obtained first. By taking the first derivative of the μ j on (52):
It is obvious that \(\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\boldsymbol{\mu}_{j}) ] = 0\), which satisfies (17). Then I c (μ j ) is given by:
Then the score function v g (y,μ j ) is constructed as follows:
The expectation of (61) is,
which also satisfies (17), and by combining the results of (60) and (61) into (13), we obtain the recursive estimation of μ, given in (22).
3.3 C.3 Derivation of Titterington’s Equation for C j
As the behavior of Titterington’s method is not clear when given score functions with respect to a matrix, the generalization of Titterington’s method for estimating parameters in matrix form can be problematic. Moreover, we have discovered that although the scores v g (y,C j ) and v c (y,C j ) satisfy (17), I c (C j ) cannot be directly computed (see Sect. 3.1). Therefore we make a simplification on C j by assuming any two elements of y are mutually independent. In this case, C j becomes diagonal. Define y=[y 1,…,y m ]T,μ j =[μ j1,…,μ jm ]T, and C j is given as:
Now p j becomes a multiplication of m independent univariate normals.
Then we have:
It is easy to verify \(\mathbb{E}_{\boldsymbol{\theta }}[ \mathbf{v}_{c}(\mathbf{y},\phi_{ji}) ] = 0\) and then we compute the FIM of ϕ ji as:
It is known that \(( y_{i} - \mu_{ji} ) / \sqrt{\phi_{ji}} \sim\mathcal{N}( 0,1 )\), in which \(\mathcal{N}( 0,1 )\) is the standard univariate normal density function. By defining \(z = ( y_{i} - \mu_{ji} ) / \sqrt{\phi_{ji}}\), then z 2 obeys the χ 2-distribution with 1 degree of freedom. According to the properties of the χ 2-distribution, \(\mathbb{E}( z^{2} ) = 1\) and \(\mathbb{D}( z^{2} ) = 2\), thus:
Therefore,
We also derive that:
Note that the expectation of v g (y,ϕ ji ) also satisfies (17). Now by combining (68) and (69) and placing ϕ ji on the diagonal of C j , we obtain
which is actually equivalent to (23) and the proof of Theorem 2 is complete.
Appendix D: Proof of Theorem 3
By removing the redundant elements in the sufficient statistic vector formulated in (51) for the complete-data likelihood function of a multivariate normal mixture, the sufficient statistic vector, s j (x) which comprises three elements for the jth member of the mixture, is given by:
Then we can obtain the conditional expected values of the three statistics under a batch setting:
By introducing the above equations into the original batch EM recursions (7)–(9) to represent the parameters of a multivariate normal mixture, we obtain:
and for C j we have:
By introducing (76) to replace μ j with sufficient statistics,
The parametric representation for sufficient statistics is easy to derive from (75)–(77):
Equations (78)–(80) are the relation between the current estimated parameters and the current sufficient statistics. Then the exponential forgetting technique is applied to update the three sufficient statistics:
Recalling (75)–(77), at iteration k+1 we have,
Considering (78)–(80), we approximate \(s_{j,1}^{(k + 1)}\) with \(\omega_{j}^{(k + 1)}\), and \(s_{j,2}^{(k + 1)}\) with \(\omega_{j}^{(k + 1)}\boldsymbol{\mu}_{j}^{(k + 1)}\). After we substitute all conditional expected values of the three statistics by the parameters’ values at epoch k and k+1, (40)–(42) are obtained and the proof is completed. A similar proof for applying this on-line EM algorithm to a mixture model of Poisson distributions is given in [8].
Rights and permissions
About this article
Cite this article
Li, D., Xu, L. & Goodman, E. On-line EM Variants for Multivariate Normal Mixture Model in Background Learning and Moving Foreground Detection. J Math Imaging Vis 48, 114–133 (2014). https://doi.org/10.1007/s10851-012-0403-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-012-0403-6